Resource limits are enforced at multiple levels: process API flags (application-level), cgroups (kernel-level), and Docker/Kubernetes settings (orchestrator-level). This layered enforcement ensures limits cannot be bypassed.
# From process manager flags
--memory-limit-bytes 4294900000 # ~4GB RAM
--cpu-shares 1024 # CPU allocation
--oom-poll-interval-ms 50 # OOM detection
# From cgroups
/sys/fs/cgroup/memory/container_*/memory.limit_in_bytes: 4GB
/sys/fs/cgroup/cpu/container_*/cpu.shares: 1024cgroups (control groups) are a Linux kernel feature that limits, accounts for, and isolates resource usage (CPU, memory, disk I/O, network) for a collection of processes. Without cgroups, a container could consume all host resources causing denial of service.
Many production systems use cgroups v1 (backward compatibility). For new deployments, consider v2 which has a unified hierarchy and better resource management.
File: /etc/systemd/system/sandbox-container@.service
[Unit]
Description=Sandbox Container for User %i
After=docker.service
Requires=docker.service
[Service]
Type=forking
User=root
ExecStartPre=/usr/local/bin/setup_cgroups.sh %i
ExecStart=/usr/bin/docker run \
--runtime=runsc \
--name=sandbox_%i \
--cgroup-parent=/sandbox/container_%i \
--memory=4g \
--memory-swap=4g \
--cpu-shares=1024 \
--cpus=2.0 \
--pids-limit=100 \
--ulimit nofile=20000:20000 \
your-sandbox-image
ExecStop=/usr/bin/docker stop sandbox_%i
ExecStopPost=/usr/local/bin/cleanup_cgroups.sh %i
[Install]
WantedBy=multi-user.targetFile: setup-cgroups.sh
#!/bin/bash
# Setup cgroups for sandbox container
CONTAINER_ID=$1
# Create cgroup directories
mkdir -p /sys/fs/cgroup/memory/sandbox/container_${CONTAINER_ID}
mkdir -p /sys/fs/cgroup/cpu/sandbox/container_${CONTAINER_ID}
mkdir -p /sys/fs/cgroup/cpuacct/sandbox/container_${CONTAINER_ID}
mkdir -p /sys/fs/cgroup/pids/sandbox/container_${CONTAINER_ID}
mkdir -p /sys/fs/cgroup/devices/sandbox/container_${CONTAINER_ID}
# Memory limits (4GB)
echo 4294900000 > /sys/fs/cgroup/memory/sandbox/container_${CONTAINER_ID}/memory.limit_in_bytes
echo 4294900000 > /sys/fs/cgroup/memory/sandbox/container_${CONTAINER_ID}/memory.memsw.limit_in_bytes
echo 1 > /sys/fs/cgroup/memory/sandbox/container_${CONTAINER_ID}/memory.oom_control
# CPU limits
echo 1024 > /sys/fs/cgroup/cpu/sandbox/container_${CONTAINER_ID}/cpu.shares
echo 200000 > /sys/fs/cgroup/cpu/sandbox/container_${CONTAINER_ID}/cpu.cfs_quota_us
echo 100000 > /sys/fs/cgroup/cpu/sandbox/container_${CONTAINER_ID}/cpu.cfs_period_us
# PID limits
echo 100 > /sys/fs/cgroup/pids/sandbox/container_${CONTAINER_ID}/pids.max
# Device whitelist
cat > /sys/fs/cgroup/devices/sandbox/container_${CONTAINER_ID}/devices.allow << 'DEVICES'
c 1:3 rwm # /dev/null
c 1:5 rwm # /dev/zero
c 1:7 rwm # /dev/full
c 1:8 rwm # /dev/random
c 1:9 rwm # /dev/urandom
c 5:0 rwm # /dev/tty
c 5:2 rwm # /dev/ptmx
c 136:* rwm # /dev/pts/*
DEVICES
echo "Cgroups configured for container ${CONTAINER_ID}"For Docker Compose deployments, resource limits are defined in the service definition. These settings apply to containers created by the Docker daemon.
File: docker-compose.yml
version: '3.8'
services:
sandbox:
image: your-sandbox-image
runtime: runsc
# Resource limits
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
# Ulimits
ulimits:
nofile:
soft: 20000
hard: 20000
nproc:
soft: 100
hard: 100
# PID limit
pids_limit: 100
# Memory settings
mem_limit: 4g
mem_reservation: 1g
memswap_limit: 4g
oom_kill_disable: false
# CPU settings
cpu_shares: 1024
cpu_quota: 200000
cpu_period: 100000In Kubernetes, resource limits are defined in the Pod specification. The runtimeClassName: gvisor field is critical - it ensures the Pod uses gVisor instead of the default container runtime.
File: sandbox-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: sandbox-pod
labels:
app: sandbox
spec:
runtimeClassName: gvisor
containers:
- name: sandbox
image: your-sandbox-image
resources:
requests:
memory: '1Gi'
cpu: '500m'
ephemeral-storage: '2Gi'
limits:
memory: '4Gi'
cpu: '2000m'
ephemeral-storage: '10Gi'
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: false
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
# Pod-level settings
securityContext:
fsGroup: 1000
sysctls:
- name: net.ipv4.ip_unprivileged_port_start
value: '0'Continuously monitor container resource usage to detect anomalies and impending resource exhaustion. This script reads from cgroup files and reports memory, CPU, and PID usage in real-time.
File: monitor-resources.sh
#!/bin/bash
# Monitor container resource usage
CONTAINER_ID=$1
CGROUP_PATH="/sys/fs/cgroup"
while true; do
echo "=== Resource Usage for ${CONTAINER_ID} ==="
echo "Timestamp: $(date)"
# Memory usage
MEMORY_USED=$(cat ${CGROUP_PATH}/memory/sandbox/container_${CONTAINER_ID}/memory.usage_in_bytes)
MEMORY_LIMIT=$(cat ${CGROUP_PATH}/memory/sandbox/container_${CONTAINER_ID}/memory.limit_in_bytes)
MEMORY_PCT=$((MEMORY_USED * 100 / MEMORY_LIMIT))
echo "Memory: ${MEMORY_USED} / ${MEMORY_LIMIT} (${MEMORY_PCT}%)"
# CPU usage
CPU_USAGE=$(cat ${CGROUP_PATH}/cpuacct/sandbox/container_${CONTAINER_ID}/cpuacct.usage)
echo "CPU Usage (nanoseconds): ${CPU_USAGE}"
# PID count
PID_CURRENT=$(cat ${CGROUP_PATH}/pids/sandbox/container_${CONTAINER_ID}/pids.current)
PID_MAX=$(cat ${CGROUP_PATH}/pids/sandbox/container_${CONTAINER_ID}/pids.max)
echo "PIDs: ${PID_CURRENT} / ${PID_MAX}"
# Check for OOM
OOM_COUNT=$(cat ${CGROUP_PATH}/memory/sandbox/container_${CONTAINER_ID}/memory.oom_control | grep "oom_kill" | awk '{print $2}')
if [ "$OOM_COUNT" != "0" ]; then
echo "[WARNING] OOM KILLS: ${OOM_COUNT}"
fi
echo ""
sleep 5
doneDifferent workloads require different resource allocations. These recommendations balance performance with security - giving enough resources for the task while preventing abuse.
| Use Case | Memory | CPU | PIDs | Disk |
|---|---|---|---|---|
| Code Execution | 2-4GB | 1-2 cores | 100 | 5GB |
| Document Processing | 4-8GB | 2-4 cores | 50 | 10GB |
| AI/ML Inference | 8-16GB | 4-8 cores | 100 | 20GB |
| Web Browser | 4-6GB | 2-3 cores | 200 | 5GB |
When memory is exhausted, the Linux OOM killer terminates processes to free resources. These settings ensure critical host processes (like SSH and systemd) are protected while sandbox containers are prioritized for termination.
Prevent OOM from killing host:
# Set OOM score adjustment (lower = less likely to be killed)
echo -1000 > /proc/$(pidof systemd)/oom_score_adj # Protect systemd
echo -1000 > /proc/$(pidof sshd)/oom_score_adj # Protect SSH
# Container gets higher OOM score (more likely to be killed)
echo 1000 > /proc/$(pidof container_process)/oom_score_adjAfter configuring limits, verify they are actually enforced. These tests confirm that containers cannot exceed their allocated resources and that resource abuse is properly contained.
# Test memory limit
docker exec sandbox bash -c 'stress --vm 1 --vm-bytes 5G --timeout 10s'
# Should fail with OOM (limit is 4GB)
# Test CPU limit
docker exec sandbox bash -c 'stress --cpu 4 --timeout 10s'
# Should use max 2 CPUs
# Test PID limit
docker exec sandbox bash -c 'fork_bomb(){ fork_bomb|fork_bomb & }; fork_bomb'
# Should stop at 100 processes