[telemetry_chargeback] Add ability to retrieve loki data#364
Conversation
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/620c0a0fec404d8c8194612f301df484 ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 04m 59s |
da56a71 to
1435f27
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9cd7283d6ea54221bd2578923299f1f7 ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 59m 38s |
|
recheck |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4414bd5301894ccf9d2e88e2b47a136e ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 01m 54s |
e9fff87 to
8e0c235
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/44b8b161f283410399414b746ccd147f ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 05m 36s |
|
recheck |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f90637c8255049bc942787bcb7534488 ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 58m 17s |
337912e to
b1f9d37
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/91213f1de0fe4530b5bb0abf7c9ba086 ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 07m 25s |
|
recheck |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4d94e80c2eac437a8b1f0b08ec1be0da ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 52m 00s |
|
recheck |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/df2cf9db2f6b4c06bf591ebddaf0f591 ✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 10m 18s |
|
recheck |
f83a87d to
81dcb03
Compare
| - loki_response.status == 200 | ||
| - loki_response.json.status == 'success' | ||
| - loki_response.json.data.result | length > 0 | ||
| - (loki_response.json.data.result | map(attribute='values') | map('length') | sum) >= (synth_data_rates.data_log.log_count | int) |
There was a problem hiding this comment.
I'm pretty sure I've pointed out in previous PRs, that this condition may have issues when multiple test scenarios are executed after each other. I wonder if it was testing with multiple test scenarios?
What happens if:
1. test_static executes
2. as part of that let's say 10 logs get pushed into loki
3. The ansible waits on this task until all 10 logs are inside loki. This is correct and all works
4. Now test_bin_basic starts executing
5. as part of that another 10 logs get pushed into loki on top of the previous 10 that are already there from the previous scenario
6. This task immediately succeeds and the ansible moves on, because we already have 10 logs in loki from previous scenario, which is >= log_count of the current scenario. It doesn't care, that some of the logs from the current scenario didn't get returned yet.
There was a problem hiding this comment.
I know I've seen that issue before and maybe you've fixed it somehow, but I may have missed the fix
There was a problem hiding this comment.
does it mean that condition in L38 should also be changed?
There was a problem hiding this comment.
This has been run with multiple scenarios and does not seem to be an issue ...at least for now. The job tests pushes loki data to the DB, then it pulls data from the DB and finally it compares the pushed and pulled data. If it is different then the job fails.
Having said all of that, to your point, I don't understand why it does not fail.
What I would like is more a query with a beginning and ending timestep:
url: "{{ loki_query_url }}?query={{ logql_query | urlencode }}&start={{ synth_data_rates.time.begin_step.nanosec }}&end={{synth_data_rates.time.end_step.nanosec }}&limit={{ limit }}"
There was a problem hiding this comment.
that query makes sense. Especially if the data from the scenarios don't overlap timewise. But if you've done tests and as you say, the data retrieved from Loki match what has been pushed, then we're probably OK.
There was a problem hiding this comment.
@vyzigold Or another another approach ...
Each test scenario gets its own isolated Loki label, preventing cross-contamination when multiple scenarios run sequentially. And i can differentiate multiple runs of the same scenario with multiple runs by start label.
There was a problem hiding this comment.
That sounds like a good solution to me.
| | `openstack_cmd` | `openstack` | The command used to execute OpenStack CLI calls. This can be customized if the binary is not in the standard PATH. | | ||
| | `cloudkitty_debug` | `false` | Enable debug mode for the role. | | ||
| | `cloudkitty_debug_dir` | `{{ (cloudkitty_debug \| bool) \| ternary(artifacts_dir_zuul + '/debug_ck_db', '') }}` | Directory for debug output (auto-set based on cloudkitty_debug flag). | | ||
| | `logs_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/logs` | Directory for log files. | | ||
| | `artifacts_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/artifacts` | Directory for generated artifacts. | | ||
| | `cert_dir` | `{{ ansible_user_dir }}/ck-certs` | Local directory for extracted ingest/query certs. | | ||
| | `local_cert_dir` | `{{ ansible_env.HOME }}/ci-framework-data/flush_certs` | Local directory for flush certs (removed by cleanup_ck.yml after the run). | | ||
| | `remote_cert_dir` | `osp-certs` | Directory inside the OpenStack pod for certs. | | ||
| | `cert_secret_name` | `cert-cloudkitty-client-internal` | OpenShift secret name for client certificates. | | ||
| | `client_secret` | `secret/cloudkitty-lokistack-gateway-client-http` | Secret for flush client certs. | | ||
| | `ca_configmap` | `cm/cloudkitty-lokistack-ca-bundle` | ConfigMap for CA bundle. | | ||
| | `logql_query` | `{service="cloudkitty"}` (overridable via `loki_query`) | LogQL query for Loki. | | ||
| | `cloudkitty_namespace` | `openstack` | OpenShift namespace for Cloudkitty/Loki resources. | | ||
| | `openstackpod` | `openstackclient` | OpenStack client pod name for exec/cp. | | ||
| | `lookback` | `6` | Days lookback for Loki query time range. | | ||
| | `limit` | `50` | Limit for Loki query results. | | ||
| | `cloudkitty_test_scenarios` | `[]` | List of test scenario files to run (default: auto-discover all `test_*.yml` files). | | ||
|
|
||
| **Example: Overriding variables when importing the role** | ||
| ```yaml | ||
| - name: "Run chargeback tests" | ||
| ansible.builtin.import_role: | ||
| name: telemetry_chargeback | ||
| vars: | ||
| cloudkitty_namespace: "my-custom-namespace" | ||
| lookback: 10 | ||
| cloudkitty_debug: true | ||
| ``` | ||
| | `openstack_cmd` | `"openstack"` | OpenStack CLI command (customize if not in PATH) | | ||
| | `cloudkitty_debug` | `false` | Enable debug mode for CloudKitty operations | | ||
| | `cloudkitty_debug_dir` | `"{{ (cloudkitty_debug \| bool) \| ternary(artifacts_dir_zuul + '/debug_ck_db', '') }}"` | Directory for debug output (auto-set based on debug flag) | | ||
| | `logs_dir_zuul` | `"{{ cifmw_basedir }}/logs"` | Directory for log files | | ||
| | `artifacts_dir_zuul` | `"{{ cifmw_basedir }}/artifacts"` | Directory for generated artifacts and test output | | ||
| | `cert_dir` | `"{{ ansible_user_dir }}/ck-certs"` | Directory for CloudKitty client certificates | | ||
| | `local_cert_dir` | `"{{ cifmw_basedir }}/flush_certs"` | Local directory for flush certificates (cleaned up after run) | | ||
| | `remote_cert_dir` | `"osp-certs"` | Remote directory inside OpenStack pod for certificates | | ||
| | `cert_secret_name` | `"cert-cloudkitty-client-internal"` | OpenShift secret name for client certificates | | ||
| | `client_secret` | `"secret/cloudkitty-lokistack-gateway-client-http"` | Secret for flush client certificates | | ||
| | `ca_configmap` | `"cm/cloudkitty-lokistack-ca-bundle"` | ConfigMap for CA bundle | | ||
| | `logql_query` | `"{service=\"cloudkitty\"}"` | LogQL query for Loki (overridable via `loki_query`) | | ||
| | `cloudkitty_namespace` | `"openstack"` | Kubernetes namespace where CloudKitty is deployed | | ||
| | `openstackpod` | `"openstackclient"` | OpenStack client pod name for exec/cp operations | | ||
| | `lookback` | `6` | Days to look back for Loki query time range | | ||
| | `limit` | `50` | Limit for Loki query results | | ||
| | `cloudkitty_test_scenarios` | `[]` | List of test scenario files to run (empty = auto-discover) | | ||
|
|
There was a problem hiding this comment.
These changes to this part of the README revert the changes in the parent PR, i.e. reverting to the state two PRs ago.
This needs to be addressed
95d9fe5 to
a58e1c1
Compare
a58e1c1 to
2f1492a
Compare
The base branch was changed.
cfeff2e to
0903ef7
Compare
Add the ability to retrieve loki data from database
Authored-by: @myadla
Co-authored-by: @ayefimov-1
AI Assisted by: Claude