Skip to content

[telemetry_chargeback] Add ability to retrieve loki data#364

Closed
ayefimov-1 wants to merge 0 commit into
masterfrom
alexy_ck_job_4
Closed

[telemetry_chargeback] Add ability to retrieve loki data#364
ayefimov-1 wants to merge 0 commit into
masterfrom
alexy_ck_job_4

Conversation

@ayefimov-1
Copy link
Copy Markdown
Contributor

@ayefimov-1 ayefimov-1 commented Apr 23, 2026

Add the ability to retrieve loki data from database

  • Pulls data from loki
  • Does basic validation of retrieved loki data

Authored-by: @myadla
Co-authored-by: @ayefimov-1
AI Assisted by: Claude

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/620c0a0fec404d8c8194612f301df484

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 04m 59s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 33m 42s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 1h 47m 22s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9cd7283d6ea54221bd2578923299f1f7

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 59m 38s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 43m 26s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 31m 54s

@ayefimov-1
Copy link
Copy Markdown
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4414bd5301894ccf9d2e88e2b47a136e

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 01m 54s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 45m 19s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 1h 41m 17s

@ayefimov-1 ayefimov-1 force-pushed the alexy_ck_job_4 branch 2 times, most recently from e9fff87 to 8e0c235 Compare April 24, 2026 12:49
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/44b8b161f283410399414b746ccd147f

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 05m 36s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 44m 08s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 1h 48m 44s

@ayefimov-1
Copy link
Copy Markdown
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f90637c8255049bc942787bcb7534488

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 58m 17s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 41m 59s
✔️ feature-verification-tests-noop SUCCESS in 4s
functional-tests-osp18 FAILURE in 1h 40m 01s

@ayefimov-1 ayefimov-1 force-pushed the alexy_ck_job_4 branch 2 times, most recently from 337912e to b1f9d37 Compare April 24, 2026 18:53
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/91213f1de0fe4530b5bb0abf7c9ba086

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 07m 25s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 51m 48s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 1h 40m 12s

@ayefimov-1
Copy link
Copy Markdown
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4d94e80c2eac437a8b1f0b08ec1be0da

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 52m 00s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 36m 42s
✔️ feature-verification-tests-noop SUCCESS in 4s
functional-tests-osp18 FAILURE in 1h 28m 38s

@ayefimov-1
Copy link
Copy Markdown
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/df2cf9db2f6b4c06bf591ebddaf0f591

✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 10m 18s
✔️ functional-chargeback-tests-osp18 SUCCESS in 1h 52m 06s
✔️ feature-verification-tests-noop SUCCESS in 5s
functional-tests-osp18 FAILURE in 1h 42m 01s

@ayefimov-1
Copy link
Copy Markdown
Contributor Author

recheck

@ayefimov-1 ayefimov-1 changed the base branch from master to alexy_db_download_tasks_files May 1, 2026 14:00
@ayefimov-1 ayefimov-1 requested a review from myadla May 1, 2026 14:17
myadla
myadla previously approved these changes May 1, 2026
Copy link
Copy Markdown
Contributor

@myadla myadla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ayefimov-1 ayefimov-1 changed the base branch from alexy_db_download_tasks_files to alexy_ck_job_3 May 1, 2026 18:25
@ayefimov-1 ayefimov-1 force-pushed the alexy_ck_job_4 branch 2 times, most recently from f83a87d to 81dcb03 Compare May 4, 2026 13:54
danpawlik
danpawlik previously approved these changes May 5, 2026
@ayefimov-1 ayefimov-1 changed the title [telemetry_chargeback] Add ability to ingest and retrieve loki data [telemetry_chargeback] Add ability to retrieve loki data May 5, 2026
@ayefimov-1 ayefimov-1 requested a review from elfiesmelfie May 5, 2026 19:39
- loki_response.status == 200
- loki_response.json.status == 'success'
- loki_response.json.data.result | length > 0
- (loki_response.json.data.result | map(attribute='values') | map('length') | sum) >= (synth_data_rates.data_log.log_count | int)
Copy link
Copy Markdown
Contributor

@vyzigold vyzigold May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I've pointed out in previous PRs, that this condition may have issues when multiple test scenarios are executed after each other. I wonder if it was testing with multiple test scenarios?

What happens if:

1. test_static executes
2.     as part of that let's say 10 logs get pushed into loki
3.     The ansible waits on this task until all 10 logs are inside loki. This is correct and all works
4. Now test_bin_basic starts executing
5.     as part of that another 10 logs get pushed into loki on top of the previous 10 that are already there from the previous scenario
6.     This task immediately succeeds and the ansible moves on, because we already have 10 logs in loki from previous scenario, which is >= log_count of the current scenario. It doesn't care, that some of the logs from the current scenario didn't get returned yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I've seen that issue before and maybe you've fixed it somehow, but I may have missed the fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that condition in L38 should also be changed?

Copy link
Copy Markdown
Contributor Author

@ayefimov-1 ayefimov-1 May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been run with multiple scenarios and does not seem to be an issue ...at least for now. The job tests pushes loki data to the DB, then it pulls data from the DB and finally it compares the pushed and pulled data. If it is different then the job fails.

Having said all of that, to your point, I don't understand why it does not fail.

What I would like is more a query with a beginning and ending timestep:

url: "{{ loki_query_url }}?query={{ logql_query | urlencode }}&start={{ synth_data_rates.time.begin_step.nanosec }}&end={{synth_data_rates.time.end_step.nanosec }}&limit={{ limit }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that query makes sense. Especially if the data from the scenarios don't overlap timewise. But if you've done tests and as you say, the data retrieved from Loki match what has been pushed, then we're probably OK.

Copy link
Copy Markdown
Contributor Author

@ayefimov-1 ayefimov-1 May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyzigold Or another another approach ...

Each test scenario gets its own isolated Loki label, preventing cross-contamination when multiple scenarios run sequentially. And i can differentiate multiple runs of the same scenario with multiple runs by start label.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good solution to me.

Comment thread roles/telemetry_chargeback/README.md Outdated
Comment on lines 43 to 63
| `openstack_cmd` | `openstack` | The command used to execute OpenStack CLI calls. This can be customized if the binary is not in the standard PATH. |
| `cloudkitty_debug` | `false` | Enable debug mode for the role. |
| `cloudkitty_debug_dir` | `{{ (cloudkitty_debug \| bool) \| ternary(artifacts_dir_zuul + '/debug_ck_db', '') }}` | Directory for debug output (auto-set based on cloudkitty_debug flag). |
| `logs_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/logs` | Directory for log files. |
| `artifacts_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/artifacts` | Directory for generated artifacts. |
| `cert_dir` | `{{ ansible_user_dir }}/ck-certs` | Local directory for extracted ingest/query certs. |
| `local_cert_dir` | `{{ ansible_env.HOME }}/ci-framework-data/flush_certs` | Local directory for flush certs (removed by cleanup_ck.yml after the run). |
| `remote_cert_dir` | `osp-certs` | Directory inside the OpenStack pod for certs. |
| `cert_secret_name` | `cert-cloudkitty-client-internal` | OpenShift secret name for client certificates. |
| `client_secret` | `secret/cloudkitty-lokistack-gateway-client-http` | Secret for flush client certs. |
| `ca_configmap` | `cm/cloudkitty-lokistack-ca-bundle` | ConfigMap for CA bundle. |
| `logql_query` | `{service="cloudkitty"}` (overridable via `loki_query`) | LogQL query for Loki. |
| `cloudkitty_namespace` | `openstack` | OpenShift namespace for Cloudkitty/Loki resources. |
| `openstackpod` | `openstackclient` | OpenStack client pod name for exec/cp. |
| `lookback` | `6` | Days lookback for Loki query time range. |
| `limit` | `50` | Limit for Loki query results. |
| `cloudkitty_test_scenarios` | `[]` | List of test scenario files to run (default: auto-discover all `test_*.yml` files). |

**Example: Overriding variables when importing the role**
```yaml
- name: "Run chargeback tests"
ansible.builtin.import_role:
name: telemetry_chargeback
vars:
cloudkitty_namespace: "my-custom-namespace"
lookback: 10
cloudkitty_debug: true
```
| `openstack_cmd` | `"openstack"` | OpenStack CLI command (customize if not in PATH) |
| `cloudkitty_debug` | `false` | Enable debug mode for CloudKitty operations |
| `cloudkitty_debug_dir` | `"{{ (cloudkitty_debug \| bool) \| ternary(artifacts_dir_zuul + '/debug_ck_db', '') }}"` | Directory for debug output (auto-set based on debug flag) |
| `logs_dir_zuul` | `"{{ cifmw_basedir }}/logs"` | Directory for log files |
| `artifacts_dir_zuul` | `"{{ cifmw_basedir }}/artifacts"` | Directory for generated artifacts and test output |
| `cert_dir` | `"{{ ansible_user_dir }}/ck-certs"` | Directory for CloudKitty client certificates |
| `local_cert_dir` | `"{{ cifmw_basedir }}/flush_certs"` | Local directory for flush certificates (cleaned up after run) |
| `remote_cert_dir` | `"osp-certs"` | Remote directory inside OpenStack pod for certificates |
| `cert_secret_name` | `"cert-cloudkitty-client-internal"` | OpenShift secret name for client certificates |
| `client_secret` | `"secret/cloudkitty-lokistack-gateway-client-http"` | Secret for flush client certificates |
| `ca_configmap` | `"cm/cloudkitty-lokistack-ca-bundle"` | ConfigMap for CA bundle |
| `logql_query` | `"{service=\"cloudkitty\"}"` | LogQL query for Loki (overridable via `loki_query`) |
| `cloudkitty_namespace` | `"openstack"` | Kubernetes namespace where CloudKitty is deployed |
| `openstackpod` | `"openstackclient"` | OpenStack client pod name for exec/cp operations |
| `lookback` | `6` | Days to look back for Loki query time range |
| `limit` | `50` | Limit for Loki query results |
| `cloudkitty_test_scenarios` | `[]` | List of test scenario files to run (empty = auto-discover) |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to this part of the README revert the changes in the parent PR, i.e. reverting to the state two PRs ago.

This needs to be addressed

@ayefimov-1 ayefimov-1 force-pushed the alexy_ck_job_3 branch 6 times, most recently from 95d9fe5 to a58e1c1 Compare May 17, 2026 19:25
Base automatically changed from alexy_ck_job_3 to master May 20, 2026 13:47
@ayefimov-1 ayefimov-1 dismissed stale reviews from danpawlik and myadla May 20, 2026 13:47

The base branch was changed.

@ayefimov-1 ayefimov-1 closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants