Security-focused AI infrastructure demo for authenticated model access, role-based authorization, per-model rate limits, audit logs, and policy-driven inference routing.
This repository uses a mock inference backend so the security and infrastructure logic can be reviewed without GPU hardware, model weights, proprietary data, or cloud credentials.
AI infrastructure teams need more than a model endpoint. They need controls around who can call which model, why the call is allowed, how abuse is throttled, and what evidence exists after the fact. This project models that control plane in a public-safe way.
- FastAPI inference gateway.
- Demo principals and role-based model policies.
- Reason-for-access enforcement for sensitive models.
- Fixed-window rate limiting by principal and model.
- Structured JSONL audit logging.
- Prometheus-compatible
/metricsendpoint for policy, limiter, and latency telemetry. - Mock GPU inference backend with latency metadata.
- Focused unit tests for policy and limiter behavior.
- Threat model and architecture notes.
- Kubernetes deployment example with health probes and scrape annotations.
- SLO, alert, and incident runbook notes.
This repo implements controls around model access, explains why a request was allowed or denied, preserves audit evidence, measures service behavior, and keeps policy, rate limiting, API routing, metrics, and inference concerns separated.
Relevant areas:
- Security infrastructure: authentication boundaries, authorization policy, audit trails, rate limits, and threat modeling.
- AI platform engineering: protected inference paths, model routing extension points, mockable backends, and operational metadata.
- Infrastructure/SRE: Prometheus-style metrics, health probes, SLO notes, runbooks, and Kubernetes deployment shape.
- Backend engineering: FastAPI service structure, focused tests, clear module boundaries, and production-control roadmap.
- Start with
gateway/app.pyfor request orchestration. - Review
gateway/policy.pyfor role and reason-for-access decisions. - Review
gateway/rate_limit.pyfor limiter behavior. - Review
gateway/metrics.pyand/metricsfor Prometheus-compatible operational telemetry. - Review
gateway/audit.pyfor structured evidence. - Review
docs/OPERATIONS.mdanddeploy/kubernetes/gateway.yamlfor SLO/runbook and deployment thinking. - Check
tests/for behavior-focused coverage. - Read
docs/PORTFOLIO_REVIEW.mdfor the role-specific review guide.
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn gateway.app:app --reloadHealth check:
curl http://localhost:8000/healthMetrics:
curl http://localhost:8000/metricsInference request:
curl -X POST http://localhost:8000/v1/infer/mission-summarizer \
-H "Content-Type: application/json" \
-H "X-Principal-Id: analyst-1" \
-d "{\"input\":\"Summarize synthetic maintenance delays\", \"reason\":\"readiness review\"}"python -m unittest discover -s testsThis project covers:
- Access-control thinking around model-serving systems.
- Public-safe audit and policy design.
- Prometheus-compatible metrics for policy denials, rate limits, and inference latency.
- Kubernetes-ready health probes, scrape annotations, and non-root runtime posture.
- Backend service design with clear separation between API, policy, rate limiting, and inference.
- A credible path toward production controls such as OIDC, mTLS, external policy engines, GPU telemetry, and model routing.
- Replace demo principals with OIDC/JWT verification.
- Add Redis-backed or gateway-level distributed rate limiting.
- Add OpenTelemetry traces and Grafana dashboard screenshots.
- Add policy-as-code examples and negative authorization tests.
- Replace the
emptyDirdemo audit volume with durable log shipping or object storage retention.
All users, models, prompts, and outputs are synthetic. Do not add secrets, customer data, real tokens, production logs, model weights, or sensitive operational details.