Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ffd9963
PMM-14169 PoC for OpenTelemetry Logging
ademidoff Jul 11, 2025
5ca2f13
PMM-14169 Fix issues in alert rules and contact point syntax
ademidoff Jul 11, 2025
56be7f7
PMM-14169 Remove redundant Dockerfile
ademidoff Jul 11, 2025
61276ca
PMM-14169 Fix the mounted file path
ademidoff Jul 11, 2025
7fcaf48
PMM-14169 Apply alert rule fixes and tweaks
ademidoff Jul 11, 2025
e3bdae3
Merge branch 'v3' into PMM-14169-poc-opentelemetry-logging
ademidoff Jul 11, 2025
b59cf2b
PMM-14169 Get rid of NoData emails
ademidoff Jul 11, 2025
f9d1260
PMM-14169 Update the receiver
ademidoff Jul 11, 2025
9432994
PMM-14169 adjust the message format
ademidoff Jul 11, 2025
5e8ee89
PMM-14169 adjust the message format and fix the alert SQL
ademidoff Jul 11, 2025
d6cb2b8
PMM-14169 update the documentation
ademidoff Jul 11, 2025
51d8e63
PMM-14169 update the dir tree
ademidoff Jul 11, 2025
f6b122c
PMM-14169 update the documentation
ademidoff Jul 12, 2025
9d53701
PMM-14169 uncomment cert-generator
ademidoff Jul 12, 2025
816775c
PMM-14169 fix alert query faliures
ademidoff Jul 13, 2025
b67bbc5
PMM-14169 modify contact point and alert rules to fix alert issues
ademidoff Jul 13, 2025
f5e92e6
PMM-14169 add arch diagrams
ademidoff Jul 13, 2025
2843c98
PMM-14169 update arch diagrams
ademidoff Jul 13, 2025
4dacca9
PMM-14169 update the goals
ademidoff Jul 14, 2025
3fcdd5e
PMM-14169 link diagrams to README
ademidoff Jul 14, 2025
2887d24
PMM-14169 add info on how to change the log retention
ademidoff Jul 14, 2025
ec8aded
Merge branch 'v3' into PMM-14169-poc-opentelemetry-logging
ademidoff Jul 16, 2025
f8cff1d
PMM-14169 add two more log receivers
ademidoff Jul 16, 2025
8f3f90c
PMM-14169 add a sample OTel dashboard
ademidoff Jul 16, 2025
fc736bd
PMM-14169 add PostgreSQL logs and alerts
ademidoff Aug 6, 2025
b59cffa
PMM-14169 reduce the rule evaluation time
ademidoff Aug 6, 2025
f05dd83
PMM-14169 clean up Docker compose
ademidoff Aug 27, 2025
26fe472
PMM-14169 update test queries
ademidoff Aug 28, 2025
70e549a
Merge branch 'v3' into PMM-14169-poc-opentelemetry-logging
theTibi Dec 3, 2025
52ac98b
Merge branch 'v3' into PMM-14169-poc-opentelemetry-logging
theTibi Dec 3, 2025
fb88e8a
PMM-14169 Add an otel dashboard
ademidoff Dec 21, 2025
11dadfa
Merge branch 'PMM-14169-poc-opentelemetry-logging' of github.com:perc…
ademidoff Dec 21, 2025
94a83d2
Merge branch 'v3' into PMM-14169-poc-opentelemetry-logging
ademidoff May 21, 2026
59ef34c
PMM-14169 Latest updates and fixes
ademidoff May 21, 2026
128baf6
chore: remove cert generator
ademidoff May 22, 2026
e7c25a9
PMM-14169 Remove ClickHouse OTel dashboard configuration
ademidoff Jun 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,426 changes: 0 additions & 1,426 deletions dev/clickhouse-config.xml

This file was deleted.

6 changes: 6 additions & 0 deletions dev/otel/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# PMM OpenTelemetry Environment Configuration
# Copy this file to .env and update the values below

# Email configuration for Grafana SMTP notifications (required)
GF_SMTP_FROM_ADDRESS=admin@yourcompany.com
GF_SECURITY_ADMIN_EMAIL=security@yourcompany.com
234 changes: 234 additions & 0 deletions dev/otel/README.md

Large diffs are not rendered by default.

252 changes: 252 additions & 0 deletions dev/otel/SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# OpenTelemetry Logging Setup Instructions (PoC)

## Project Structure
This project has the following directory structure:

```
├── clickhouse
│ ├── config.d
│ │ └── config-override.xml
│ ├── client-config.xml
│ ├── config-override.xml
│ └── test.sql
├── doc
│ ├── otel-collector.png
│ ├── password-change-failure-alert.png
│ └── password-change-success-alert.png
├── grafana
│ ├── alert-rules.yml
│ ├── change-admin-password
│ ├── clickhouse-datasource.yml
│ ├── contact-points.yml
│ ├── datasources.yml
│ └── notification-policies.yml
├── nginx
│ └── nginx.conf
├── test
│ ├── clickhouse-test.sh
│ └── setup-test.sh
├── .env.example
├── config.yml
├── docker-compose.yml
├── README.md
└── SETUP.md
```

## Setup Steps

### 1. Clone the Project
```bash
git clone https://github.com/percona/pmm.git
cd pmm/dev/otel
```

### 2. Configure Environment
```bash
# Copy the example environment file
cp .env.example .env

# Edit the .env file with your settings
vim .env # or use your preferred editor
```

**Required Environment Variables:**
- `GF_SMTP_FROM_ADDRESS`: Email address for sending alert notifications
- `GF_SECURITY_ADMIN_EMAIL`: Admin email address for Grafana (for sending user invites, etc.)

**Example .env configuration:**
```bash
# Email configuration for Grafana SMTP notifications (required)
GF_SMTP_FROM_ADDRESS=admin@yourcompany.com
GF_SECURITY_ADMIN_EMAIL=security@yourcompany.com
```

### 3. Update Email Addresses for Alerts
Edit the contact points configuration to use your email addresses:
```bash
# Edit the contact points file
vim grafana/contact-points.yml

# Update the addresses with your emails:
# addresses: "admin@yourcompany.com;security@yourcompany.com"
```

### 4. Start the Environment
```bash
# Start all services
docker compose up -d

# Check service status
docker compose ps

# View logs
docker compose logs -f cert-generator
docker compose logs -f otel-collector
docker compose logs -f pmm-server
```

### 5. Generate Logs
PMM generates quite some logs on during user interaction, so after a few moments of interaction, you can start exploring the logs. However, you may choose to generate a few log lines manually, for example Nginx logs, for testing purposes:

```bash
# Generate various HTTP responses
curl -k -u admin:admin https://localhost/ # 200 OK
curl -k -u admin:admin https://localhost/graph/api/users # 200 OK
curl -k https://localhost/graph/api/users/1 # 401 Unauthorized
curl -k -u admin:admin https://localhost/graph/nonexistent # 404 Not Found
```

### 6. Access ClickHouse
```bash
# Connect to ClickHouse CLI
docker exec -it pmm-server clickhouse-client --user=default --password=clickhouse --database=otel
```

### 7. Run Test Queries
Execute the test queries from the `clickhouse/test.sql` file in the ClickHouse client.

### 8. Test Security Alerts
To test the admin password change alert, you can change the admin password in Grafana. This will trigger an alert if configured correctly:

```bash
# Use the command line tool:
docker exec -it pmm-server change-admin-password "<new-password>"
```

**Expected behavior:**
- The alert should trigger within 1 minute of password change
- You should receive an email notification delivered to the configured addresses
- Check MailHog UI at http://localhost:8025 to see emails sent by triggered alerts

Likewise, to test the admin password change failure alert, you can pass an empty password or one that does not meet the password requirements:

```bash
# Use the command line tool with an empty password
docker exec -it pmm-server change-admin-password ""
```

### 9. Monitor Alert System
```bash
# Check Grafana alerting logs
docker exec -it pmm-server bash
grep "ngalert" /srv/logs/grafana.log

# View alert rules in Grafana UI
# Go to https://localhost:443
# Navigate to Alerting > Alert Rules

# Check contact points and notification policies
# Navigate to Alerting > Contact Points
# Navigate to Alerting > Notification Policies
```

### 10. Adding more Logs
You can add more log sources to PMM server by modifying the `config.yml` file. If you want to add an external log source, you can configure the OpenTelemetry Collector to scrape logs from that source. To read more, refer to the [OpenTelemetry Collector documentation](https://opentelemetry.io/docs/collector/configuration).

### 11. Changing Log Retention
To change the log retention period, modify the ClickHouse table TTL settings in the `clickhouse/config.d/config-override.xml` file:
```xml
<clickhouse>
<profiles>
<default>
<ttl_only_drop_parts>1</ttl_only_drop_parts>
</default>
</profiles>
<tables>
<otel.logs>
<ttl>
<column name="TimestampTime" unit="day" value="7"/>
</ttl>
</otel.logs>
</tables>
</clickhouse>
```

Alternatively, you can also change the TTL by running the following query using the ClickHouse client:
```sql
ALTER TABLE otel.logs MODIFY TTL TimestampTime + INTERVAL 7 DAY;
```

### 12. Creating Dashboards
You can create custom dashboards in PMM to visualize the logs. Use the `ClickHouse-Logs` data source to query the `otel.logs` table and create panels for different log types, such as Nginx access logs, Grafana logs, pmm-managed logs, pmm-agent logs, and more.

#### Example Query for Log Linecount by Service
Panel Description: Log Linecount by Service
```sql
SELECT ServiceName AS service, COUNT(*) as count FROM "otel"."logs" WHERE ( timestamp >= $__fromTime AND timestamp <= $__toTime ) GROUP BY service ORDER BY count DESC;
```

#### Example Query for Nginx Status by Severity
Panel Description: Nginx Status by Severity
```sql
SELECT
CASE WHEN LogAttributes['status'] = '' THEN 'N/A' ELSE LogAttributes['status'] END AS mapping, COUNT(*) AS count
FROM otel.logs
WHERE ( Timestamp >= $__fromTime AND Timestamp <= $__toTime ) AND ServiceName = 'nginx'
GROUP BY LogAttributes['status'], SeverityNumber
ORDER BY LogAttributes['status']
```

#### Example Query for General Logs
Panel Description: General Logs
```sql
SELECT Timestamp as "timestamp", Body as "body", SeverityText as "level", LogAttributes as "labels", TraceId as "traceID" FROM "otel"."logs" WHERE ( timestamp >= $__fromTime AND timestamp <= $__toTime ) ORDER BY timestamp DESC LIMIT 1000
```

## Troubleshooting

### Check Project Setup
```bash
cd test
bash setup-test.sh
```

### ClickHouse Data Verification
```bash
# Check table exists and has data
docker exec -it pmm-server clickhouse-client --user=default --password=clickhouse --database=otel -q "SELECT count() FROM otel.logs"

# View most recent logs
docker exec -it pmm-server clickhouse-client --user=default --password=clickhouse --database=otel -q "SELECT * FROM otel.logs ORDER BY Timestamp DESC LIMIT 10"
```

## Services and Ports

- **PMM**: https://localhost:443
- **ClickHouse Native**: localhost:9000
- **OpenTelemetry OTLP gRPC**: localhost:4317
- **OpenTelemetry OTLP HTTP**: localhost:4318
- **MailHog Web UI**: http://localhost:8025 (for testing email notifications)

## Alert System Configuration

This PoC includes a complete security alerting system that monitors:

### Security Alerts:
- **Admin Password Changes**: Detects when admin password is successfully reset
- **Failed Password Attempts**: Detects failed admin password change attempts

### Alert Configuration Files:
- `grafana/alert-rules.yml`: Defines the alert rules and queries
- `grafana/contact-points.yml`: Email notification configuration
- `grafana/notification-policies.yml`: Alert routing and grouping policies
- `grafana/datasources.yml`: ClickHouse data source for log queries

### Notification Flow:
1. OpenTelemetry Collector ingests Grafana logs
2. Logs are stored in ClickHouse `otel.logs` table
3. Grafana alert rules query ClickHouse for security events
4. Alerts are routed via notification policies
5. Email notifications are sent via configured SMTP (MailHog for testing)

## Cleanup
```bash
# Stop and remove all containers (keep the data)
docker compose down

# Stop and remove all container and volumes (this will delete all data)
docker compose down -v

# Remove images
docker compose down --rmi all
```
25 changes: 25 additions & 0 deletions dev/otel/clickhouse/client-config.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!-- Config set into /etc/clickhouse-client/. It's used if no other configs are found. -->
<config>
<openSSL>
<client> <!-- Used for connection to server's secure tcp port -->
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<!-- Use for self-signed: <verificationMode>none</verificationMode> -->
<invalidCertificateHandler>
<!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>

<prompt_by_server_display_name>
<default>{display_name} :) </default>
<test>{display_name} \e[1;32m:)\e[0m </test> <!-- if it matched to the substring "test" in the server display name - -->
<production>{display_name} \e[1;31m:)\e[0m </production> <!-- if it matched to the substring "production" in the server display name -->
</prompt_by_server_display_name>

<password>clickhouse</password>
<database>otel</database>
</config>
13 changes: 13 additions & 0 deletions dev/otel/clickhouse/config.d/config-override.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<?xml version="1.0"?>
<clickhouse>
<!-- Listen on all interfaces for external access -->
<listen_host>0.0.0.0</listen_host>

<!-- Startup scripts for database initialization -->
<startup_scripts>
<throw_on_error>false</throw_on_error>
<scripts>
<query>CREATE DATABASE IF NOT EXISTS otel</query>
</scripts>
</startup_scripts>
</clickhouse>
Loading
Loading