This section provides actionable security guidance based on real-world deployment experience. Following these recommendations will help you avoid common pitfalls and maintain a secure sandbox environment over time.
These DO's and DON'Ts are derived from security audits and incident analysis. Each item addresses a specific vulnerability that has been exploited in production container environments.
DO:
- Always use gVisor (runsc) for multi-tenant workloads
- Enable
enforce_container_binding: truein JWT - Use 9p mounts with
roflag for sensitive data - Drop ALL capabilities then add back minimum required
- Set
no-new-privileges: true - Implement network egress whitelist via proxy
- Use ephemeral containers (destroy after session)
- Monitor resource usage continuously
- Implement audit logging for all operations
- Regular security testing (penetration tests)
DON'T:
- Never use default Docker runtime for untrusted code
- Never mount
/var/run/docker.sockin containers - Never run containers with
--privileged - Never trust user input without validation
- Never allow arbitrary network egress
- Never skip resource limits (memory, CPU, PIDs)
- Never use long-lived JWT tokens (max 4 hours)
- Never expose container management API without auth
- Never store secrets in environment variables (use secrets manager)
- Never disable security features "for convenience"
Despite best efforts, security incidents may occur. This section provides a tiered response framework that helps you quickly assess severity and take appropriate action without over or under-reacting.
Early detection is critical. These metrics indicate potential security events that require investigation:
- OOM kills (>5/hour = investigate)
- Monitor failed auth attempts (>10/minute = alert)
- Network blocks (>100/minute = analyze)
- Container escape attempts (any = CRITICAL)
Respond based on severity. Not every alert requires emergency shutdown - over-reacting can cause unnecessary downtime, while under-reacting can allow breaches to escalate.
Level 1: Suspicious Activity
At this stage, indicators suggest possible reconnaissance or probing. Continue operations but increase monitoring to gather more data before taking disruptive action.
- Increase logging verbosity
- Review audit logs
- Monitor resource usage
- Continue operation with heightened monitoring
Level 2: Confirmed Attack
Evidence confirms an active breach or successful exploitation. Take immediate action to contain the damage while preserving forensic evidence for investigation.
- Isolate affected container(s)
- Capture memory dump for forensics
- Block source IP/user
- Rotate JWT signing keys
- Review all sessions from affected user
Level 3: System Compromise
Full system compromise detected - the attacker has gained significant access or the integrity of the entire cluster is in question. Execute emergency procedures immediately.
- Trigger emergency shutdown
- Isolate entire cluster
- Preserve evidence (logs, memory dumps, network captures)
- Notify security team
- Begin forensic investigation
- Restore from known-good backup
After containment, focus on learning from the incident and preventing recurrence. The post-incident phase is critical for improving your security posture.
- Root cause analysis
- Update security controls
- Patch vulnerabilities
- Update documentation
- Team training
Security is not a one-time setup but an ongoing process. This maintenance schedule ensures your sandbox remains secure as new vulnerabilities are discovered and your infrastructure evolves.
Daily:
- Monitor active container count
- Review failed authentication attempts
- Check OOM kills
- Verify network proxy health
Weekly:
- Review audit logs
- Test backup restoration
- Check for CVEs in base images
- Update dependencies
Monthly:
- Penetration testing
- Security audit
- Review and update whitelist
- Rotate JWT signing keys
- Update documentation
Quarterly:
- Third-party security assessment
- Disaster recovery drill
- Compliance audit
- Architecture review