Feat/grafana monitoring#5
Merged
Merged
Conversation
Phase 1-4 完整監控方案,全程單一 origin、隨模型啟停自動跟隨: - backend: 新增 prometheus_targets 服務,reconciler 在 vLLM 進/出 READY 時動態寫 file_sd targets,Prometheus 無需改設定即自動發現艦隊 (LLMOPS_PROMETHEUS_SD_PATH;含單元測試) - deploy: 新增 prometheus / grafana / dcgm-exporter / node-exporter services;prometheus、grafana 與 backend 共用 netns;nginx 反代 /grafana(單一 origin,含 absolute_redirect off 修 port 重導) - grafana: provision datasource + 官方 vLLM(Performance/Query)、DCGM、 Node Exporter dashboards,加自訂 "vLLM Scheduling & Capacity" (排程/容量/工作負載,變數化 datasource+model_name+instance), 4 條 vLLM alert rules + webhook contact point(env 帶入) - frontend: 新增「監控」分頁嵌入 5 張 dashboard(kiosk、主題同步) - 移除已被 grafana 取代的 /trends(前端頁面 + 後端 timeseries endpoint 與 store 方法) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- 新增 "vLLM Overview" 總覽 dashboard(single pane of glass):系統健康 stat 列、延遲/吞吐、容量、基礎設施(GPU+host)濃縮一頁;TTFT/E2E/KV 門檻線、嵌入告警清單 panel、以 process_start_time_seconds 偵測的模型 (重)啟動事件標註 - frontend: 「監控」分頁新增「總覽」tab 並設為預設 - 官方 vLLM(Performance/Query)+ Node Exporter 全部時序 panel 改 spanNulls=true,間歇流量下不再斷線 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Observability 改列 Grafana 監控(動態 SD 發現、GPU/host 指標、嵌入監控 分頁、門檻線/標註/告警);移除已刪的 Trends/趨勢 - Docker 拓撲表加入 prometheus / grafana / dcgm-exporter / node-exporter, frontend 補 /grafana 反代,說明段補 netns 共用與 prometheus/grafana volume - 新增「Monitoring (Grafana)」小節(英/中);兩版同步 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.