Finalize OTLP-only observability and Grafana dashboard fixes#49
Conversation
There was a problem hiding this comment.
Pull request overview
This PR finalizes an OTLP-first observability setup: traces and logs are exported from the Spring Boot app to an OpenTelemetry Collector (Tempo for traces, Loki for logs), removing Promtail from the stack and updating Grafana provisioning accordingly.
Changes:
- Switched Spring Boot observability configuration to OTLP exporters for tracing + logging, and added Logback → OpenTelemetry log bridging.
- Updated OTel Collector config to export logs to Loki and traces to Tempo; removed Promtail from Docker Compose/config docs.
- Refreshed Grafana datasources/dashboards (LogQL/TraceQL fixes, filtering
/actuator/prometheus, new labels/derived fields), and added/expanded tests around log correlation + auth counters.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/lt/satsyuk/service/KeycloakAuthServiceTest.java | Adds meter-tag regression tests for auth success/failure counters. |
| src/test/java/lt/satsyuk/observability/LoggingCorrelationTest.java | Adds a Spring Boot test to verify trace/span correlation appears in logs. |
| src/main/resources/logback-spring.xml | Adds OpenTelemetry Logback appender alongside JSON console logging. |
| src/main/resources/application.properties | Updates management OTLP/OTel properties for trace + log exporting. |
| src/main/java/lt/satsyuk/config/OtelLogbackConfig.java | Installs/binds the OpenTelemetry Logback appender to Spring’s OTel instance. |
| README.md | Documents the OTLP-first observability flow and recommended properties. |
| promtail-config.yaml | Removes Promtail configuration (no longer used). |
| pom.xml | Replaces explicit tracing deps with Spring Boot OTel starter; adds OTel Logback appender dependency. |
| otel.yaml | Adds Loki exporter + log pipeline to the OTel Collector config. |
| KODA.md | Updates project structure docs to remove Promtail. |
| grafana/provisioning/datasources/tempo.yaml | Adjusts Tempo→Loki traces-to-logs mapping for OTLP/Loki labels. |
| grafana/provisioning/datasources/loki.yaml | Updates derived-field regex for linking logs to traces via traceId. |
| grafana/provisioning/dashboards/traces-dashboard.json | Updates TraceQL queries to exclude /actuator/prometheus noise. |
| grafana/provisioning/dashboards/logs-dashboard.json | Updates LogQL selectors to prefer OTLP-ingested labels (with fallback). |
| grafana/provisioning/dashboards/application-metrics-dashboard.json | Improves PromQL robustness (fallbacks, divide-by-zero protection, more portable metrics). |
| grafana/provisioning/dashboards/app-metrics.json | Updates log panels/queries for OTLP label scheme and formatting. |
| docker-compose.yaml | Routes app traces/logs to OTel Collector; removes Promtail service. |
| CHANGELOG.md | Updates stack description to reflect Promtail removal. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
| "editorMode": "code", | ||
| "expr": "sum by(level) (rate({job=\"jwt-demo\"} |= \"$log_keyword\" | json | level != \"\" [1m]))", | ||
| "expr": "sum by(level) (rate({service_name=\"jwt-demo\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m])) + sum by(level) (rate({job=~\".*jwt-demo.*\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m]))", |
There was a problem hiding this comment.
The LogQL metric query combines two rate() series using +. In PromQL/LogQL vector math, if one side returns no series (e.g., after removing Promtail so the {job=~".*jwt-demo.*"...} selector is empty), the entire A + B result becomes empty and the panel shows no data even though the OTLP stream has logs. Consider unioning streams before the aggregation (e.g., sum by(level)(rate(selectorA[1m]) or rate(selectorB[1m]))) or otherwise defaulting missing sides to 0 so the panel remains populated when only one source exists.
| "expr": "sum by(level) (rate({service_name=\"jwt-demo\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m])) + sum by(level) (rate({job=~\".*jwt-demo.*\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m]))", | |
| "expr": "sum by(level) (rate({service_name=\"jwt-demo\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m]) or rate({job=~\".*jwt-demo.*\",level=~\".+\"} |= \"$log_keyword\" != \"/actuator/prometheus\" [1m]))", |
| **Recommended properties for Variant B** | ||
|
|
||
| - `management.opentelemetry.tracing.export.otlp.endpoint=${MANAGEMENT_OTLP_TRACING_ENDPOINT:http://localhost:4318/v1/traces}` | ||
| - `management.tracing.export.otlp.enabled=true` | ||
| - `management.opentelemetry.logging.export.otlp.endpoint=${MANAGEMENT_OTLP_LOGGING_ENDPOINT:http://localhost:4318/v1/logs}` | ||
| - `management.logging.export.otlp.enabled=true` | ||
| - `management.otlp.metrics.export.enabled=false` (avoid duplicate metric ingestion with Prometheus scrape) |
There was a problem hiding this comment.
README references “Recommended properties for Variant B”, but there’s no definition of “Variant A/B” anywhere else in the document (this is the only occurrence). Consider renaming this heading to something self-contained (e.g., “Recommended properties”) or adding a short explanation of what “Variant B” means to avoid confusing readers.
|
Addressed both review notes:\n\n1) Updated Log Level Rate query in grafana/provisioning/dashboards/app-metrics.json to use or union before sum by(level) (instead of A + B) so panel still returns data when fallback stream is empty.\n2) Renamed README heading from Recommended properties for Variant B to Recommended properties.\n\nAlso validated updated dashboard JSON locally. |
|



Что сделано
Проверка