From 462ac3b80909edc0b2dd454842c268e76ed7cf43 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 6 May 2026 16:52:15 +0000 Subject: [PATCH] feat(admin): GET /admin/drift config drift endpoint (WOR-132) Operators and dashboards can now scrape /admin/drift to see whether the on-disk config has diverged from what the proxy has loaded, without triggering a reload. Closes WOR-132. Mechanics: * AdminState gains a Mutex> baseline tracking the 12-char SHA-256 prefix of the raw YAML bytes the proxy loaded. with_loaded_config_content_hash() seeds it at startup; the reload handler refreshes it on every successful swap. * handle_drift compares that baseline against a fresh hash of the on-disk file and returns {config_path, loaded_revision, loaded_content_hash, on_disk_content_hash, drift, on_disk_size_bytes, checked_at}. 503 if no on-disk path or no baseline; 500 (with scrubbed path) on read failure; 405 on non-GET. * The pipeline's existing config_revision is reported alongside but intentionally not used for drift comparison: it is an origin-set identity hash and does not move when only policies, transforms, or ports change. The raw-bytes hash is what an operator means by drift. Six new admin tests cover unauthorized, method, no-path, no-baseline, missing-file (sanitised path), no-drift, and post-edit-drift paths. docs/configuration.md gains a /admin/drift subsection under the Admin fields table. Drive-by build fix: prometheus 0.14 unifies the with_label_values generic V across the array literal, which forces all elements to the same type. Heterogeneous &[&String, &str, ...] sites in sbproxy-observe::metrics and sbproxy-core::server failed to compile. Coerced every such call site to uniform &[&str] via .as_str(). No behavioural change; CHANGELOG entry under Fixed. https://claude.ai/code/session_019zc6oCY6Kx2ssiuZEQdznk --- CHANGELOG.md | 23 ++ crates/sbproxy-core/src/admin.rs | 297 ++++++++++++++++++++++++++ crates/sbproxy-core/src/server.rs | 9 +- crates/sbproxy-observe/src/metrics.rs | 92 ++++---- docs/configuration.md | 44 ++++ 5 files changed, 424 insertions(+), 41 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c62ea50c..801f7f6a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,18 @@ of the new YAML fields below until the version that ships them. ### Added +- **`GET /admin/drift` config drift endpoint (WOR-132).** Returns + whether the on-disk config file has diverged from what the running + proxy has loaded, without triggering a reload. Compares a + content-hash baseline captured at startup (and refreshed on every + `/admin/reload`) against a fresh hash of the current file. K8s + operators and dashboards scrape this so they can flag an edited + config that has not been hot-reloaded yet. Documented in + `docs/configuration.md` ยง Admin fields. + ([crates/sbproxy-core/src/admin.rs], + [crates/sbproxy-core/src/server.rs], + [docs/configuration.md]) + - **Deterministic clock-skew testing hooks.** `ClockSkewMonitor` now accepts an injected clock source for tests while production continues to use the system clock. @@ -172,6 +184,17 @@ of the new YAML fields below until the version that ships them. ### Fixed +- **Build under prometheus 0.14 type inference.** Sites in + `sbproxy-observe::metrics` and `sbproxy-core::server` that passed + heterogeneous `&[&String, &str]` arrays to + `prometheus::with_label_values` no longer compile on prometheus + 0.14 because Rust unifies the array element type to `&String` and + rejects bare `&str` literals. Coerced all such call sites to + uniform `&[&str]` via `.as_str()` so the workspace builds clean + again. No behavioural change. + ([crates/sbproxy-observe/src/metrics.rs], + [crates/sbproxy-core/src/server.rs]) + - **WASM extension docs corrected.** `CLAUDE.md` previously labeled the WASM surface as "WASM stub" while marketing docs claimed production-grade support; the runtime is real diff --git a/crates/sbproxy-core/src/admin.rs b/crates/sbproxy-core/src/admin.rs index 446af773..8525084b 100644 --- a/crates/sbproxy-core/src/admin.rs +++ b/crates/sbproxy-core/src/admin.rs @@ -248,6 +248,21 @@ pub struct AdminState { /// pipeline. `None` when the admin server is constructed without /// a known on-disk config (e.g. in unit tests). pub config_path: Option, + /// 12-char hex prefix of SHA-256 of the raw YAML bytes that + /// produced the running pipeline (same format as + /// [`crate::identity::config_revision`]). Set by + /// [`AdminState::with_loaded_config_content_hash`] at startup + /// and refreshed by the reload handler on every successful swap. + /// `None` until the proxy has loaded a config from disk (which + /// means `/admin/drift` cannot make a determination yet). + /// + /// Tracked alongside `pipeline.config_revision`: the pipeline + /// revision is an origin-set identity hash and does not move when + /// only policies, transforms, or ports change, so it cannot + /// answer "has the on-disk file drifted from what is loaded?". The + /// raw-bytes SHA-256 moves on any byte-level edit, which is what + /// an operator means by drift. + pub loaded_config_content_hash: Mutex>, /// Single-flight guard for `/admin/reload`. /// /// We CAS this from `false` to `true` on entry; if the swap @@ -276,6 +291,7 @@ impl AdminState { config, openapi_cache: Mutex::new(OpenApiCache::empty()), config_path: None, + loaded_config_content_hash: Mutex::new(None), reload_in_progress: AtomicBool::new(false), health_registry: sbproxy_observe::default_registry_optional(None, None), } @@ -292,6 +308,21 @@ impl AdminState { self } + /// Builder-style setter for the loaded-config SHA-256. + /// + /// Called by the binary at startup once the initial YAML has been + /// read so `/admin/drift` can compare the on-disk file's current + /// hash against the hash captured at load time. The reload + /// handler updates the same field on every successful swap so the + /// drift baseline tracks the live pipeline. + pub fn with_loaded_config_content_hash(self, hex: impl Into) -> Self { + *self + .loaded_config_content_hash + .lock() + .expect("loaded config sha256 mutex poisoned") = Some(hex.into()); + self + } + /// Replace the health registry. Wave 1 callers seed the registry /// with `sbproxy_observe::default_registry(...)` so `/readyz` /// reports the standard pillar set; subsequent waves register @@ -712,7 +743,12 @@ fn handle_reload(state: &AdminState) -> (u16, &'static str, String) { } let revision = new_pipeline.config_revision.clone(); + let content_hash = crate::identity::config_revision(yaml.as_bytes()); crate::reload::load_pipeline(new_pipeline); + *state + .loaded_config_content_hash + .lock() + .expect("loaded config content hash mutex poisoned") = Some(content_hash); let loaded_at = chrono::Utc::now().to_rfc3339(); tracing::info!( config_revision = %revision, @@ -731,6 +767,93 @@ fn handle_reload(state: &AdminState) -> (u16, &'static str, String) { ) } +// --- /admin/drift --- + +/// Compare the on-disk config file at [`AdminState::config_path`] +/// against the content-hash captured the last time the proxy loaded +/// a config (startup or [`AdminState::with_loaded_config_content_hash`] +/// or `POST /admin/reload`). +/// +/// Returns the loaded revision (origin-set identity hash), the loaded +/// content hash, the current on-disk content hash, and a `drift` +/// boolean. K8s + dashboards scrape this so an operator can see when +/// the running proxy has diverged from the declared config without +/// triggering a reload. +/// +/// Failure modes: +/// +/// * `503` - the admin server has no on-disk config path (test mode +/// or non-file-backed configuration), or no content-hash baseline +/// has been captured yet. Drift detection has nothing to compare +/// against. +/// * `500` - the on-disk file could not be read (permissions, ENOENT +/// after start, etc.). The error message has the path scrubbed by +/// [`sanitise_path_in_error`] so the response does not leak the +/// absolute config path. +fn handle_drift(state: &AdminState) -> (u16, &'static str, String) { + let pipeline = crate::reload::current_pipeline(); + let loaded_revision = pipeline.config_revision.clone(); + + let config_path = match &state.config_path { + Some(p) => p.clone(), + None => { + return ( + 503, + "application/json", + r#"{"error":"admin server has no on-disk config path; drift detection unavailable"}"# + .to_string(), + ); + } + }; + + let loaded_content_hash = state + .loaded_config_content_hash + .lock() + .expect("loaded config content hash mutex poisoned") + .clone(); + let loaded_content_hash = match loaded_content_hash { + Some(h) => h, + None => { + return ( + 503, + "application/json", + r#"{"error":"no loaded config content hash baseline; drift detection unavailable until first reload"}"# + .to_string(), + ); + } + }; + + let bytes = match std::fs::read(&config_path) { + Ok(b) => b, + Err(e) => { + tracing::warn!(error = %e, "admin drift: failed to read config file"); + let msg = sanitise_path_in_error(&e.to_string(), &config_path); + return ( + 500, + "application/json", + format!( + r#"{{"error":"failed to read config file: {}"}}"#, + msg.replace('"', "'") + ), + ); + } + }; + let on_disk_content_hash = crate::identity::config_revision(&bytes); + let drift = on_disk_content_hash != loaded_content_hash; + + let body = serde_json::json!({ + "config_path": config_path.display().to_string(), + "loaded_revision": loaded_revision, + "loaded_content_hash": loaded_content_hash, + "on_disk_content_hash": on_disk_content_hash, + "drift": drift, + "on_disk_size_bytes": bytes.len(), + "checked_at": chrono::Utc::now().to_rfc3339(), + }) + .to_string(); + (200, "application/json", body) +} + // --- Request Handler --- /// Handle an admin API request. @@ -812,6 +935,19 @@ pub fn handle_admin_request( ); } + // GET /admin/drift: compare loaded config against on-disk file. + // Read-only, idempotent, side-effect-free; only GET is accepted. + if path == "/admin/drift" { + if method.eq_ignore_ascii_case("GET") { + return handle_drift(state); + } + return ( + 405, + "application/json", + r#"{"error":"method not allowed"}"#.to_string(), + ); + } + // --- Route --- match path { // Recent request log. @@ -1488,6 +1624,167 @@ origins: ); } + // --- /admin/drift --- + + #[test] + fn admin_drift_unauthorized_returns_401() { + let state = make_state(); + let (status, _, _) = handle_admin_request("GET", "/admin/drift", &state, None); + assert_eq!(status, 401); + } + + #[test] + fn admin_drift_rejects_post() { + let state = make_state(); + let auth = basic_auth("admin", "secret"); + let (status, _, _) = handle_admin_request("POST", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 405); + } + + #[test] + fn admin_drift_without_config_path_returns_503() { + let state = make_state(); + let auth = basic_auth("admin", "secret"); + let (status, _, body) = handle_admin_request("GET", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 503); + assert!(body.contains("no on-disk config path"), "got: {body}"); + } + + #[test] + fn admin_drift_without_content_hash_baseline_returns_503() { + // config_path is set but no content-hash baseline yet (nothing + // has called `with_loaded_config_content_hash` and no reload + // has occurred). Drift cannot be determined. + let f = write_yaml(&reload_yaml("drift-no-baseline.example.com")); + let state = AdminState::new(AdminConfig { + enabled: true, + port: 9090, + username: "admin".to_string(), + password: "secret".to_string(), + max_log_entries: 5, + }) + .with_config_path(f.path()); + let auth = basic_auth("admin", "secret"); + let (status, _, body) = handle_admin_request("GET", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 503); + assert!( + body.contains("no loaded config content hash baseline"), + "got: {body}" + ); + } + + #[test] + fn admin_drift_missing_file_returns_500_with_sanitised_path() { + // Point at a file that does not exist. Seed the baseline so + // we get past the no-baseline 503 path. The handler should + // surface the I/O error but scrub the absolute path so the + // body does not leak the operator's filesystem layout. + let dir = tempfile::tempdir().expect("tempdir"); + let bogus = dir.path().join("does-not-exist.yml"); + let state = AdminState::new(AdminConfig { + enabled: true, + port: 9090, + username: "admin".to_string(), + password: "secret".to_string(), + max_log_entries: 5, + }) + .with_config_path(&bogus) + .with_loaded_config_content_hash("deadbeefcafe"); + let auth = basic_auth("admin", "secret"); + let (status, ct, body) = handle_admin_request("GET", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 500, "body: {body}"); + assert_eq!(ct, "application/json"); + let abs = bogus.to_string_lossy().to_string(); + assert!( + !body.contains(&abs), + "absolute path leaked into error: {body}" + ); + } + + #[test] + fn admin_drift_after_reload_reports_no_drift() { + // Reload to make the loaded revision deterministic, then + // query drift against the same file: revisions match, drift + // is false. + let f = write_yaml(&reload_yaml("reload-drift-noop.example.com")); + let state = AdminState::new(AdminConfig { + enabled: true, + port: 9090, + username: "admin".to_string(), + password: "secret".to_string(), + max_log_entries: 5, + }) + .with_config_path(f.path()); + let auth = basic_auth("admin", "secret"); + let (rstatus, _, _) = handle_admin_request("POST", "/admin/reload", &state, Some(&auth)); + assert_eq!(rstatus, 200); + + let (status, ct, body) = handle_admin_request("GET", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 200, "body: {body}"); + assert_eq!(ct, "application/json"); + let parsed: serde_json::Value = serde_json::from_str(&body).expect("valid json"); + assert_eq!(parsed.get("drift").and_then(|v| v.as_bool()), Some(false)); + let loaded = parsed + .get("loaded_content_hash") + .and_then(|v| v.as_str()) + .expect("loaded_content_hash string"); + let on_disk = parsed + .get("on_disk_content_hash") + .and_then(|v| v.as_str()) + .expect("on_disk_content_hash string"); + assert_eq!(loaded, on_disk, "content hashes should match after reload"); + // The origin-set identity hash also surfaces; sanity-check + // that it's a 12-char hex string (matches config_revision()'s + // contract). + let origin_revision = parsed + .get("loaded_revision") + .and_then(|v| v.as_str()) + .expect("loaded_revision string"); + assert_eq!(origin_revision.len(), 12); + assert!(parsed.get("on_disk_size_bytes").is_some()); + assert!(parsed.get("checked_at").is_some()); + } + + #[test] + fn admin_drift_after_file_change_reports_drift() { + // Reload, mutate the file, query drift: on-disk hash differs + // from the loaded revision. + let f = write_yaml(&reload_yaml("reload-drift-edit-a.example.com")); + let state = AdminState::new(AdminConfig { + enabled: true, + port: 9090, + username: "admin".to_string(), + password: "secret".to_string(), + max_log_entries: 5, + }) + .with_config_path(f.path()); + let auth = basic_auth("admin", "secret"); + let (rstatus, _, _) = handle_admin_request("POST", "/admin/reload", &state, Some(&auth)); + assert_eq!(rstatus, 200); + + // Edit the file in place. The loaded pipeline still has the + // pre-edit revision; the on-disk file hashes differently. + std::fs::write( + f.path(), + reload_yaml("reload-drift-edit-b.example.com").as_bytes(), + ) + .expect("rewrite yaml"); + + let (status, _, body) = handle_admin_request("GET", "/admin/drift", &state, Some(&auth)); + assert_eq!(status, 200, "body: {body}"); + let parsed: serde_json::Value = serde_json::from_str(&body).expect("valid json"); + assert_eq!(parsed.get("drift").and_then(|v| v.as_bool()), Some(true)); + let loaded = parsed + .get("loaded_content_hash") + .and_then(|v| v.as_str()) + .expect("loaded_content_hash string"); + let on_disk = parsed + .get("on_disk_content_hash") + .and_then(|v| v.as_str()) + .expect("on_disk_content_hash string"); + assert_ne!(loaded, on_disk, "revisions should differ after file change"); + } + // --- Rate Limiter --- #[test] diff --git a/crates/sbproxy-core/src/server.rs b/crates/sbproxy-core/src/server.rs index 5e9d55f3..f074c15a 100644 --- a/crates/sbproxy-core/src/server.rs +++ b/crates/sbproxy-core/src/server.rs @@ -10228,7 +10228,7 @@ impl ProxyHttp for SbProxy { if duration > 0.0 { metrics() .request_duration - .with_label_values(&[&hostname]) + .with_label_values(&[hostname.as_str()]) .observe(duration); } @@ -10244,7 +10244,7 @@ impl ProxyHttp for SbProxy { if _e.is_some() { metrics() .errors_total - .with_label_values(&[&hostname, "proxy_error"]) + .with_label_values(&[hostname.as_str(), "proxy_error"]) .inc(); } @@ -11136,6 +11136,7 @@ pub fn run(config_path: &str) -> anyhow::Result<()> { // Load and compile the config. let yaml = std::fs::read_to_string(config_path) .map_err(|e| anyhow::anyhow!("failed to read config file '{}': {}", config_path, e))?; + let initial_content_hash = crate::identity::config_revision(yaml.as_bytes()); let compiled = sbproxy_config::compile_config(&yaml)?; if let Some(al) = compiled.access_log.as_ref() { log_capture_header_warnings(al); @@ -11535,7 +11536,9 @@ pub fn run(config_path: &str) -> anyhow::Result<()> { // the AdminState so a manual reload during a watcher reload // serialises cleanly. let admin_state = std::sync::Arc::new( - crate::admin::AdminState::new(admin_cfg).with_config_path(config_path), + crate::admin::AdminState::new(admin_cfg) + .with_config_path(config_path) + .with_loaded_config_content_hash(initial_content_hash.clone()), ); // Pingora's `Server::run_forever` builds its own multi-thread // tokio runtime; spawning before run_forever installs the diff --git a/crates/sbproxy-observe/src/metrics.rs b/crates/sbproxy-observe/src/metrics.rs index 649fe855..f991e34c 100644 --- a/crates/sbproxy-observe/src/metrics.rs +++ b/crates/sbproxy-observe/src/metrics.rs @@ -588,23 +588,23 @@ pub fn record_request_with_labels( // affecting the per-origin views below. m.requests_total .with_label_values(&[ - &hostname_san, + hostname_san.as_str(), method, - &status_str, - &agent_id, - &agent_class, - &agent_vendor, - &payment_rail, - &content_shape, + status_str.as_str(), + agent_id.as_str(), + agent_class.as_str(), + agent_vendor.as_str(), + payment_rail.as_str(), + content_shape.as_str(), ]) .inc(); // --- Per-origin views (unchanged label set; pre-existing) --- m.per_origin_requests_total - .with_label_values(&[&origin_san, method, &status_str]) + .with_label_values(&[origin_san.as_str(), method, status_str.as_str()]) .inc(); m.per_origin_request_duration - .with_label_values(&[&origin_san, method, &status_str]) + .with_label_values(&[origin_san.as_str(), method, status_str.as_str()]) .observe(duration_secs); // Wave 1 exemplar: stamp the active trace_id onto the latency // histogram so Grafana's "click an outlier" path reaches the @@ -634,12 +634,12 @@ pub fn record_request_with_labels( ); if bytes_in > 0 { m.bytes_total - .with_label_values(&[&origin_san, "in"]) + .with_label_values(&[origin_san.as_str(), "in"]) .inc_by(bytes_in as f64); } if bytes_out > 0 { m.bytes_total - .with_label_values(&[&origin_san, "out"]) + .with_label_values(&[origin_san.as_str(), "out"]) .inc_by(bytes_out as f64); } } @@ -652,7 +652,7 @@ pub fn record_auth(origin: &str, auth_type: &str, allowed: bool) { let result = if allowed { "allow" } else { "deny" }; metrics() .auth_results - .with_label_values(&[&origin, auth_type, result]) + .with_label_values(&[origin.as_str(), auth_type, result]) .inc(); } @@ -684,7 +684,13 @@ pub fn record_policy_with_labels( ); metrics() .policy_triggers - .with_label_values(&[&origin_san, policy_type, action, &agent_id, &agent_class]) + .with_label_values(&[ + origin_san.as_str(), + policy_type, + action, + agent_id.as_str(), + agent_class.as_str(), + ]) .inc(); } @@ -705,7 +711,9 @@ pub fn record_capture_budget_drop(workspace_id: &str, dimension: &'static str) { .expect("capture budget counter registers") }); let workspace = sanitize_label("workspace", workspace_id); - counter.with_label_values(&[&workspace, dimension]).inc(); + counter + .with_label_values(&[workspace.as_str(), dimension]) + .inc(); } /// Record drop counters returned by the Wave 8 capture helpers. @@ -737,7 +745,7 @@ pub fn record_capture_drop( }); let workspace = sanitize_label("workspace", workspace_id); counter - .with_label_values(&[&workspace, dimension, reason]) + .with_label_values(&[workspace.as_str(), dimension, reason]) .inc_by(n); } @@ -758,7 +766,9 @@ pub fn record_a2a_hop(route: &str, spec: &str, decision: &str) { .expect("a2a hops counter registers") }); let route = sanitize_label("route", route); - counter.with_label_values(&[&route, spec, decision]).inc(); + counter + .with_label_values(&[route.as_str(), spec, decision]) + .inc(); } /// Record an A2A chain depth observation (Wave 7 / A7.2). Surfaces @@ -778,7 +788,7 @@ pub fn record_a2a_chain_depth(route: &str, spec: &str, depth: u32) { .expect("a2a chain depth histogram registers") }); let route = sanitize_label("route", route); - hist.with_label_values(&[&route, spec]) + hist.with_label_values(&[route.as_str(), spec]) .observe(depth as f64); } @@ -798,7 +808,7 @@ pub fn record_a2a_denied(route: &str, reason: &str) { .expect("a2a denied counter registers") }); let route = sanitize_label("route", route); - counter.with_label_values(&[&route, reason]).inc(); + counter.with_label_values(&[route.as_str(), reason]).inc(); } /// Record a request blocked by the `http_framing` policy. The @@ -845,7 +855,7 @@ pub fn record_cache(origin: &str, result: &str) { let origin = sanitize_label("origin", origin); metrics() .cache_results - .with_label_values(&[&origin, result]) + .with_label_values(&[origin.as_str(), result]) .inc(); } @@ -854,7 +864,7 @@ pub fn record_circuit_breaker(origin: &str, from_state: &str, to_state: &str) { let origin = sanitize_label("origin", origin); metrics() .circuit_breaker_transitions - .with_label_values(&[&origin, from_state, to_state]) + .with_label_values(&[origin.as_str(), from_state, to_state]) .inc(); } @@ -1030,14 +1040,20 @@ mod tests { let count = m .per_origin_requests_total - .with_label_values(&[&sanitized, "GET", "200"]) + .with_label_values(&[sanitized.as_str(), "GET", "200"]) .get(); assert_eq!(count, 2.0, "expected 2 requests recorded"); - let bytes_in = m.bytes_total.with_label_values(&[&sanitized, "in"]).get(); + let bytes_in = m + .bytes_total + .with_label_values(&[sanitized.as_str(), "in"]) + .get(); assert_eq!(bytes_in, 3072.0, "bytes_in should be 1024 + 2048"); - let bytes_out = m.bytes_total.with_label_values(&[&sanitized, "out"]).get(); + let bytes_out = m + .bytes_total + .with_label_values(&[sanitized.as_str(), "out"]) + .get(); assert_eq!(bytes_out, 768.0, "bytes_out should be 512 + 256"); } @@ -1053,13 +1069,13 @@ mod tests { let allow_count = m .auth_results - .with_label_values(&[&sanitized, "api_key", "allow"]) + .with_label_values(&[sanitized.as_str(), "api_key", "allow"]) .get(); assert_eq!(allow_count, 1.0); let deny_count = m .auth_results - .with_label_values(&[&sanitized, "api_key", "deny"]) + .with_label_values(&[sanitized.as_str(), "api_key", "deny"]) .get(); assert_eq!(deny_count, 2.0); } @@ -1079,19 +1095,19 @@ mod tests { // sentinel. Read back with the same label tuple. let rl = m .policy_triggers - .with_label_values(&[&sanitized, "rate_limit", "deny", "", ""]) + .with_label_values(&[sanitized.as_str(), "rate_limit", "deny", "", ""]) .get(); assert_eq!(rl, 1.0); let ip = m .policy_triggers - .with_label_values(&[&sanitized, "ip_filter", "deny", "", ""]) + .with_label_values(&[sanitized.as_str(), "ip_filter", "deny", "", ""]) .get(); assert_eq!(ip, 1.0); let waf = m .policy_triggers - .with_label_values(&[&sanitized, "waf", "allow", "", ""]) + .with_label_values(&[sanitized.as_str(), "waf", "allow", "", ""]) .get(); assert_eq!(waf, 1.0); } @@ -1221,14 +1237,14 @@ mod tests { let count = m .requests_total .with_label_values(&[ - &origin_san, + origin_san.as_str(), "GET", "200", - &agent_id_san, - &agent_class_san, - &agent_vendor_san, - &payment_rail_san, - &content_shape_san, + agent_id_san.as_str(), + agent_class_san.as_str(), + agent_vendor_san.as_str(), + payment_rail_san.as_str(), + content_shape_san.as_str(), ]) .get(); assert!(count >= 1, "agent-labelled request must increment"); @@ -1245,7 +1261,7 @@ mod tests { // tuple, which is the "no agent context attached" series. let count = m .requests_total - .with_label_values(&[&origin_san, "POST", "201", "", "", "", "", ""]) + .with_label_values(&[origin_san.as_str(), "POST", "201", "", "", "", "", ""]) .get(); assert_eq!(count, 1, "legacy record_request must use empty sentinel"); } @@ -1274,11 +1290,11 @@ mod tests { let count = m .policy_triggers .with_label_values(&[ - &origin_san, + origin_san.as_str(), "rate_limit", "deny", - &agent_id_san, - &agent_class_san, + agent_id_san.as_str(), + agent_class_san.as_str(), ]) .get(); assert!(count >= 1.0, "policy trigger must stamp agent_id"); diff --git a/docs/configuration.md b/docs/configuration.md index 7afb9ec4..947b773e 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -209,11 +209,55 @@ per-IP rate limit. Endpoints: | `GET /api/health` | Liveness check returning `{"status":"ok"}`. | | `GET /api/openapi.json` | Emitted OpenAPI 3.0 document for the running pipeline. | | `GET /api/openapi.yaml` | Same document in YAML. | +| `POST /admin/reload` | Re-read the on-disk config file and hot-swap the pipeline. Single-flight; concurrent calls return 409. | +| `GET /admin/drift` | Compare the on-disk config file against the loaded baseline. See below. | Unauthenticated requests get a 401 with a `WWW-Authenticate: Basic` header. Requests from outside `127.0.0.1` are dropped at the socket level. +#### `GET /admin/drift` + +Returns whether the on-disk config file has diverged from what the +running proxy has loaded, without triggering a reload. K8s +operators and dashboards scrape this so they can flag a config that +was edited on disk but not yet hot-reloaded. + +Response shape (200 OK): + +```json +{ + "config_path": "/etc/sbproxy/sb.yml", + "loaded_revision": "a3f5b1d829c4", + "loaded_content_hash": "8e1c5d4a9f7b", + "on_disk_content_hash": "8e1c5d4a9f7b", + "drift": false, + "on_disk_size_bytes": 4321, + "checked_at": "2026-05-06T15:42:00Z" +} +``` + +* `loaded_revision` is the 12-char origin-set identity hash from the + running pipeline. Stable when only policies, transforms, or ports + change; moves when origins or hostnames are added or removed. +* `loaded_content_hash` is the 12-char SHA-256 prefix of the raw YAML + bytes captured at load time (startup or last successful + `/admin/reload`). +* `on_disk_content_hash` is the same hash recomputed against the + current file contents. +* `drift` is `true` iff the two content hashes differ. + +Failure modes: + +* `503` - the admin server has no on-disk config path (constructed + without `with_config_path`, e.g. tests), or no content-hash + baseline has been captured yet (no startup load and no successful + reload). +* `500` - the on-disk file could not be read. The error message has + the absolute path scrubbed so the response does not leak the + operator's filesystem layout. +* `405` - any verb other than `GET`. + ### Metrics fields | Field | Type | Default | Description |