Skip to content

Lacking Support for Zstd Compression When Extracting Container Image Layers #278

@plamen-bardarov

Description

@plamen-bardarov

Summary

The layer_source.go file in grootfs currently lacks support for zstd (Zstandard) compression when extracting container image layers. This is a significant limitation as zstd-compressed layers are becoming increasingly common in modern OCI container images, especially with tools like Docker, Podman, and container registries that now support this compression format.

Problem Description

When pulling container images with zstd-compressed layers, grootfs will fail to decompress them correctly. The current implementation only handles:

  1. Uncompressed tar archives (no MediaType specified)
  2. Gzip-compressed tar archives (MediaType containing "gzip")

There is no handling for zstd-compressed layers, which have media types such as:

  • application/vnd.oci.image.layer.v1.tar+zstd
  • application/vnd.oci.image.layer.nondistributable.v1.tar+zstd

Root Cause Analysis

1. The Blob Method Only Supports Gzip

Looking at the Blob method in layer_source.go (lines 143-155):

blobIDHash := sha256.New()
digestReader := io.NopCloser(io.TeeReader(countingBlobReader, blobIDHash))
if layerInfo.MediaType == "" || strings.Contains(layerInfo.MediaType, "gzip") {
    logger.Debug("uncompressing-blob")

    digestReader, err = gzip.NewReader(digestReader)
    if err != nil {
        return "", 0, errorspkg.Wrapf(err, "expected blob to be of type %s", layerInfo.MediaType)
    }
    defer digestReader.Close()
}

Problem: The condition layerInfo.MediaType == "" || strings.Contains(layerInfo.MediaType, "gzip") only handles:

  • Empty media type (defaults to gzip decompression)
  • Media types containing "gzip"

If a layer has MediaType = "application/vnd.oci.image.layer.v1.tar+zstd", this condition evaluates to false, and the data is treated as uncompressed. This will cause:

  1. Incorrect data written to disk - The compressed zstd data is written as-is without decompression
  2. DiffID checksum mismatch - The diffIDHash will be computed over compressed data instead of uncompressed data, causing the checksum validation to fail
  3. Layer extraction failure - The subsequent tar extraction will fail because the data is still compressed

2. The v1DiffID Method Also Only Supports Gzip

Looking at the v1DiffID method (lines 316-334):

func (s *LayerSource) v1DiffID(logger lager.Logger, layer types.BlobInfo, imgSrc types.ImageSource) (digestpkg.Digest, error) {
    blob, _, err := s.getBlobWithRetries(logger, imgSrc, layer)
    if err != nil {
        return "", errorspkg.Wrap(err, "fetching V1 layer blob")
    }
    defer blob.Close()

    gzipReader, err := gzip.NewReader(blob)
    if err != nil {
        return "", errorspkg.Wrap(err, "creating reader for V1 layer blob")
    }

    data, err := io.ReadAll(gzipReader)
    if err != nil {
        return "", errorspkg.Wrap(err, "reading V1 layer blob")
    }
    sha := sha256.Sum256(data)

    return digestpkg.NewDigestFromHex("sha256", hex.EncodeToString(sha[:])), nil
}

Problem: This method unconditionally uses gzip.NewReader() without checking the layer's media type. While this method is specifically for V1 schema images (which historically only used gzip), it demonstrates the hard-coded assumption throughout the codebase.

3. DiffID Validation Logic Analysis

The DiffID validation logic appears correct in design but will fail for zstd layers due to the decompression issue:

// Lines 160-161: Create hash for DiffID
diffIDHash := sha256.New()
digestReader = io.NopCloser(io.TeeReader(digestReader, diffIDHash))

// Lines 175-178: Validate DiffID checksum
if err = s.checkCheckSum(logger, diffIDHash, layerInfo.DiffID); err != nil {
    return "", 0, errorspkg.Wrap(err, "diffID digest mismatch")
}

The logic is:

  1. The diffIDHash is computed by reading the decompressed data through the TeeReader
  2. The computed hash is compared against layerInfo.DiffID (which is the expected hash of uncompressed content)
  3. This will correctly fail for zstd layers because the data was never decompressed

Impact

  1. Cannot pull images with zstd layers - Any OCI image using zstd compression will fail to be pulled
  2. Error messages may be misleading - Users will see "diffID digest mismatch" errors without understanding the root cause is unsupported compression
  3. Growing incompatibility - As more registries and tools adopt zstd (which offers better compression ratios and faster decompression than gzip), this limitation becomes more impactful

Affected Media Types

Per the OCI Image Spec (found in vendor files), the following media types use zstd compression:

// From vendor/github.com/opencontainers/image-spec/specs-go/v1/mediatype.go
MediaTypeImageLayerZstd = "application/vnd.oci.image.layer.v1.tar+zstd"
MediaTypeImageLayerNonDistributableZstd = "application/vnd.oci.image.layer.nondistributable.v1.tar+zstd"

Proposed Solution

Option 1: Add Zstd Decompression Support

Modify the Blob method to handle zstd compression:

import (
    "compress/gzip"
    "github.com/klauspost/compress/zstd"
    // ... other imports
)

// In the Blob method:
blobIDHash := sha256.New()
digestReader := io.NopCloser(io.TeeReader(countingBlobReader, blobIDHash))

switch {
case layerInfo.MediaType == "" || strings.Contains(layerInfo.MediaType, "gzip"):
    logger.Debug("uncompressing-blob-gzip")
    digestReader, err = gzip.NewReader(digestReader)
    if err != nil {
        return "", 0, errorspkg.Wrapf(err, "expected blob to be of type %s", layerInfo.MediaType)
    }
    defer digestReader.Close()

case strings.Contains(layerInfo.MediaType, "zstd"):
    logger.Debug("uncompressing-blob-zstd")
    zstdReader, err := zstd.NewReader(digestReader)
    if err != nil {
        return "", 0, errorspkg.Wrapf(err, "expected blob to be of type %s", layerInfo.MediaType)
    }
    digestReader = zstdReader.IOReadCloser()
    defer digestReader.Close()

default:
    // Uncompressed tar archive - no action needed
    logger.Debug("blob-uncompressed")
}

Option 2: Use Generic Decompression Library

Use a library like github.com/containers/image/v5/pkg/compression which already handles multiple compression formats:

import (
    "github.com/containers/image/v5/pkg/compression"
)

// In the Blob method:
blobIDHash := sha256.New()
digestReader := io.NopCloser(io.TeeReader(countingBlobReader, blobIDHash))

// Detect and decompress based on content or media type
decompressor, _, err := compression.AutoDecompress(digestReader)
if err != nil {
    return "", 0, errorspkg.Wrap(err, "decompressing blob")
}
digestReader = decompressor
defer digestReader.Close()

Additional Considerations

  1. Dependencies: The github.com/klauspost/compress/zstd package is a well-maintained, pure-Go implementation of zstd that's already used by many container tools.

Acceptance criteria

Support zstd compression

Related links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Inbox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions