Skip to content

Latest commit

 

History

History
285 lines (221 loc) · 7.96 KB

File metadata and controls

285 lines (221 loc) · 7.96 KB

Layer 2: Filesystem Isolation

Configuration Example

The 9P protocol enables fine-grained filesystem access control between the host and container. Unlike standard bind mounts which share the kernel's filesystem view, 9P acts as a translation layer that can enforce access policies.

Mount Analysis - What these flags mean:

# Read-only 9p mounts (secure data)
none on /mnt/skills/public type 9p (ro,trans=fd,rfdno=X,wfdno=Y,...)
none on /mnt/user-data/uploads type 9p (ro,trans=fd,rfdno=X,wfdno=Y,...)

# Read-write 9p mount (user outputs only)
none on /mnt/user-data/outputs type 9p (rw,trans=fd,rfdno=X,wfdno=Y,...)

# Key security flags:
# - disable_file_handle_sharing
# - disable_fifo_open
# - directfs (controlled access)

About the 9P Protocol

9P (also called Styx or Plan 9 Protocol) is a network protocol originally designed for Plan 9 from Bell Labs. It treats files as the fundamental unit of interaction, making it ideal for controlled filesystem access in sandboxed environments.

Why 9P instead of NFS or bind mounts?

  • Works well with gVisor
  • Fine-grained access control
  • Plan 9 protocol - designed for distributed systems
  • Prevents symlink attacks when configured properly

Setup Guide

Setting up 9P involves configuring a server on the host that exports directories, then mounting those directories inside containers. The server controls access permissions while the mount options enforce security policies.

1. Setup 9P Server (Host Side)

Install diod (9P server):

sudo apt-get install diod

File: /etc/diod.conf

# Listen on Unix socket
listen 127.0.0.1:564

# Export directories with different permissions
exports {
    # Read-only skills directory
    path "/srv/sandbox/skills" {
        uname = "nobody"
        ro = true
    }

    # Read-only uploads (per-user)
    path "/srv/sandbox/users/%u/uploads" {
        uname = "%u"
        ro = true
    }

    # Read-write outputs (per-user)
    path "/srv/sandbox/users/%u/outputs" {
        uname = "%u"
        rw = true
    }
}

# Security options
options {
    userdb = "/etc/passwd"
    allsquash = true
    squashuser = "nobody"
}

Start diod:

sudo systemctl enable diod
sudo systemctl start diod

2. Mount 9P in Container

Once the 9P server is running on the host, mount the exported directories inside the container. Use the :ro,9p and :rw,9p flags to enable 9P protocol with appropriate permissions.

Docker run example:

docker run --runtime=runsc \
  --device=/dev/fuse \
  -v /srv/sandbox/skills:/mnt/skills:ro,9p \
  -v /srv/sandbox/users/$USER_ID/uploads:/mnt/uploads:ro,9p \
  -v /srv/sandbox/users/$USER_ID/outputs:/mnt/outputs:rw,9p \
  your-sandbox-image

Kubernetes Pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: secure-sandbox
spec:
  runtimeClassName: gvisor
  containers:
    - name: sandbox
      image: your-sandbox-image
      volumeMounts:
        - name: skills
          mountPath: /mnt/skills
          readOnly: true
        - name: user-uploads
          mountPath: /mnt/uploads
          readOnly: true
        - name: user-outputs
          mountPath: /mnt/outputs
          readOnly: false
  volumes:
    - name: skills
      hostPath:
        path: /srv/sandbox/skills
        type: Directory
    - name: user-uploads
      hostPath:
        path: /srv/sandbox/users/{{ user_id }}/uploads
        type: Directory
    - name: user-outputs
      hostPath:
        path: /srv/sandbox/users/{{ user_id }}/outputs
        type: Directory

3. Critical Security Flags

These mount flags are non-negotiable for secure 9P operation. Each flag addresses a specific attack vector - omitting any of them could allow container escape or privilege escalation.

Mount options to ALWAYS use:

# Mount with security flags
mount -t 9p -o \
  trans=fd,\
  rfdno=X,wfdno=Y,\
  ro,\                              # Read-only (for sensitive data)
  nodev,\                           # No device files
  nosuid,\                          # No SUID binaries
  noexec,\                          # No execution (optional)
  disable_file_handle_sharing,\     # Prevent file handle leaks
  disable_fifo_open,\               # Prevent FIFO attacks
  cache=none \                      # No caching (optional, impacts performance)
  /host/path /container/path

Directory Layout

The following directory structure separates shared resources (skills) from user-specific data (uploads and outputs). This layout ensures users can only write to their own output directories while accessing shared resources read-only.

Host filesystem:

flowchart TD
    root[/srv/sandbox/]
    skills[skills/]
    users[users/]
    public[public/]
    examples[examples/]
    u1[user_001/]
    u2[user_002/]
    docx[docx/]
    pdf[pdf/]
    u1_up[uploads/ <br/>read-only]
    u1_out[outputs/ <br/>read-write]
    u1_meta[metadata.json]
    u2_up[uploads/]
    u2_out[outputs/]

    root --> skills
    root --> users
    skills --> public
    skills --> examples
    public --> docx
    public --> pdf
    users --> u1
    users --> u2
    u1 --> u1_up
    u1 --> u1_out
    u1 --> u1_meta
    u2 --> u2_up
    u2 --> u2_out

    style skills fill:#0288d1,color:#fff
    style u1_up fill:#f57c00,color:#fff
    style u1_out fill:#388e3c,color:#fff
    style u2_up fill:#f57c00,color:#fff
    style u2_out fill:#388e3c,color:#fff
Loading

Preventing Symlink Attacks

Symlink attacks attempt to access files outside the mounted directory by creating symbolic links that traverse upward (e.g., ln -s /etc/shadow ./shadow). Without proper protection, containers could read sensitive host files or escape their isolation boundaries.

Test your configuration:

# Inside container - try to escape via symlink
ln -s /etc/shadow /mnt/outputs/shadow_link

# Should fail with: "Operation not permitted" or "Read-only file system"

Proper 9P configuration prevents:

  • Symlinks pointing outside mounted directory
  • Hard links across devices
  • FIFO/socket file creation
  • Device file creation

Example Mount Script

This script automates the 9P mounting process. It uses file descriptor passing to establish secure communication channels between the host 9P server and the container.

File: mount-sandbox-volumes.sh

USER_ID=$1
CONTAINER_ID=$2

# File descriptor numbers passed from container runtime
SKILL_RFD=${SKILL_RFD:-6}
SKILL_WFD=${SKILL_WFD:-7}
UPLOAD_RFD=${UPLOAD_RFD:-8}
UPLOAD_WFD=${UPLOAD_WFD:-9}
OUTPUT_RFD=${OUTPUT_RFD:-10}
OUTPUT_WFD=${OUTPUT_WFD:-11}

# Create user directories if they don't exist
mkdir -p /srv/sandbox/users/${USER_ID}/{uploads,outputs}
chown -R nobody:nogroup /srv/sandbox/users/${USER_ID}

# Mount skills (shared, read-only)
mount -t 9p -o trans=fd,rfdno=$SKILL_RFD,wfdno=$SKILL_WFD,ro,nodev,nosuid,disable_file_handle_sharing \
  /srv/sandbox/skills /proc/${CONTAINER_ID}/root/mnt/skills

# Mount user uploads (read-only)
mount -t 9p -o trans=fd,rfdno=$UPLOAD_RFD,wfdno=$UPLOAD_WFD,ro,nodev,nosuid,disable_file_handle_sharing \
  /srv/sandbox/users/${USER_ID}/uploads /proc/${CONTAINER_ID}/root/mnt/uploads

# Mount user outputs (read-write, but controlled)
mount -t 9p -o trans=fd,rfdno=$OUTPUT_RFD,wfdno=$OUTPUT_WFD,rw,nodev,nosuid,disable_file_handle_sharing \
  /srv/sandbox/users/${USER_ID}/outputs /proc/${CONTAINER_ID}/root/mnt/outputs

echo "Mounted volumes for user ${USER_ID} in container ${CONTAINER_ID}"

WARNING CRITICAL: Common Mistakes

These mistakes bypass the 9P protection entirely, exposing your host filesystem directly to containers. Never use these patterns in production.

Incorrect - Mounting host directories directly:

# INSECURE - no 9p, no isolation
docker run -v /host/data:/data ubuntu

Incorrect - Forgetting read-only flag:

# INSECURE - skills should be ro
mount /skills /mnt/skills  # Missing 'ro' flag

Correct approach:

# SECURE - 9p with read-only
mount -t 9p -o ro,disable_file_handle_sharing /skills /mnt/skills