Skip to content

Latest commit

 

History

History
123 lines (87 loc) · 4.26 KB

File metadata and controls

123 lines (87 loc) · 4.26 KB

Recommendations

This section provides actionable security guidance based on real-world deployment experience. Following these recommendations will help you avoid common pitfalls and maintain a secure sandbox environment over time.

Security Recommendations

These DO's and DON'Ts are derived from security audits and incident analysis. Each item addresses a specific vulnerability that has been exploited in production container environments.

DO:

  1. Always use gVisor (runsc) for multi-tenant workloads
  2. Enable enforce_container_binding: true in JWT
  3. Use 9p mounts with ro flag for sensitive data
  4. Drop ALL capabilities then add back minimum required
  5. Set no-new-privileges: true
  6. Implement network egress whitelist via proxy
  7. Use ephemeral containers (destroy after session)
  8. Monitor resource usage continuously
  9. Implement audit logging for all operations
  10. Regular security testing (penetration tests)

DON'T:

  1. Never use default Docker runtime for untrusted code
  2. Never mount /var/run/docker.sock in containers
  3. Never run containers with --privileged
  4. Never trust user input without validation
  5. Never allow arbitrary network egress
  6. Never skip resource limits (memory, CPU, PIDs)
  7. Never use long-lived JWT tokens (max 4 hours)
  8. Never expose container management API without auth
  9. Never store secrets in environment variables (use secrets manager)
  10. Never disable security features "for convenience"

Incident Response

Despite best efforts, security incidents may occur. This section provides a tiered response framework that helps you quickly assess severity and take appropriate action without over or under-reacting.

Detection

Early detection is critical. These metrics indicate potential security events that require investigation:

  • OOM kills (>5/hour = investigate)
  • Monitor failed auth attempts (>10/minute = alert)
  • Network blocks (>100/minute = analyze)
  • Container escape attempts (any = CRITICAL)

Response Procedure

Respond based on severity. Not every alert requires emergency shutdown - over-reacting can cause unnecessary downtime, while under-reacting can allow breaches to escalate.

Level 1: Suspicious Activity

At this stage, indicators suggest possible reconnaissance or probing. Continue operations but increase monitoring to gather more data before taking disruptive action.

  1. Increase logging verbosity
  2. Review audit logs
  3. Monitor resource usage
  4. Continue operation with heightened monitoring

Level 2: Confirmed Attack

Evidence confirms an active breach or successful exploitation. Take immediate action to contain the damage while preserving forensic evidence for investigation.

  1. Isolate affected container(s)
  2. Capture memory dump for forensics
  3. Block source IP/user
  4. Rotate JWT signing keys
  5. Review all sessions from affected user

Level 3: System Compromise

Full system compromise detected - the attacker has gained significant access or the integrity of the entire cluster is in question. Execute emergency procedures immediately.

  1. Trigger emergency shutdown
  2. Isolate entire cluster
  3. Preserve evidence (logs, memory dumps, network captures)
  4. Notify security team
  5. Begin forensic investigation
  6. Restore from known-good backup

Post-Incident

After containment, focus on learning from the incident and preventing recurrence. The post-incident phase is critical for improving your security posture.

  • Root cause analysis
  • Update security controls
  • Patch vulnerabilities
  • Update documentation
  • Team training

Maintenance

Security is not a one-time setup but an ongoing process. This maintenance schedule ensures your sandbox remains secure as new vulnerabilities are discovered and your infrastructure evolves.

Daily:

  • Monitor active container count
  • Review failed authentication attempts
  • Check OOM kills
  • Verify network proxy health

Weekly:

  • Review audit logs
  • Test backup restoration
  • Check for CVEs in base images
  • Update dependencies

Monthly:

  • Penetration testing
  • Security audit
  • Review and update whitelist
  • Rotate JWT signing keys
  • Update documentation

Quarterly:

  • Third-party security assessment
  • Disaster recovery drill
  • Compliance audit
  • Architecture review