- Table of Contents
- 0. Overview
- 1. Package Compression Overview
- 2. Strategy Pattern Interfaces
- 3. Interface Granularity and Composition
- 4. In-Memory Compression Methods
- 5. Streaming Compression Methods
- 6. File-Based Compression Methods
- 7. Compression Information and Status
- 7.1 Compression Information Structure Reference
- 7.2 Compression Status Methods
- 7.2.1 Package Compression Query Methods
- 7.2.2 Package GetPackageCompressionInfo Method
- 7.2.3 Package IsPackageCompressed Method
- 7.2.4 Package GetPackageCompressionType Method
- 7.2.5 Package GetPackageCompressionRatio Method
- 7.2.6 Package GetPackageOriginalSize Method
- 7.2.7 Package GetPackageCompressedSize Method
- 7.2.8 Package Compression Control Methods
- 7.2.9 Metadata Index Methods
- 7.2.10 Generic Compression Methods
- 7.3 Internal Compression Methods
- 8. Concurrency Patterns and Thread Safety
- 9. Compression Configuration Patterns
- 10. Compression and Signing Relationship
- 11. CompressionStrategy Selection
- 12. Error Handling
- 13. Modern Best Practices
- 14. Structured Error System
This document defines the NovusPack package compression API, providing methods for compressing and decompressing package content while maintaining package integrity and signature compatibility.
- Core Package Interface API - Package operations and compression
- Package Writing API - SafeWrite, FastWrite, and write strategy selection
- File Format Specifications - .nvpk format structure and signature implementation
- File Compression API - Individual file compression operations (FileEntry.Compress, Package.CompressFile, etc.)
- Security and Encryption - Comprehensive security architecture and encryption implementation
- Generic Types and Patterns - Generic concurrency patterns and type-safe configuration
- Streaming and Buffer Management - Streaming concurrency patterns and buffer management
All public methods in the NovusPack Compression API accept context.Context as the first parameter to support:
- Request cancellation and timeout handling
- Request-scoped values and configuration
- Graceful shutdown and resource cleanup
- Integration with Go's standard context patterns
This follows 2025 Go best practices and ensures the API is compatible with modern Go applications and frameworks.
Package compression in NovusPack compresses package content using separate compression for metadata and data blocks, while preserving the header, metadata index, package comment, and signatures in an uncompressed state for direct access. This enables selective decompression of metadata without requiring full package decompression.
This section describes the scope of compression operations.
When package compression is enabled (header flags bits 15-8 != 0), the following content is compressed:
- FileEntry metadata: Each FileEntry (64 bytes + variable data) is compressed individually using LZ4 compression
- File data: Each file's data is compressed individually using the package compression type (Zstd, LZ4, or LZMA)
- File index: The regular file index is compressed as a single block using LZ4 compression
Note: Package compression compresses all file data as part of the package structure. For compressing individual files within a package (without compressing the entire package), see File Compression API.
Special metadata files (types 65000-65535) are handled as regular FileEntry objects:
- FileEntry metadata compressed with LZ4 (same as all FileEntry metadata)
- File data (YAML content) compressed with LZ4 for fast access
- Note that this is only a requirement for fully-compressed packages; special metadata files can also be stored uncompressed when NOT implementing full package compression
The following content remains uncompressed for direct access:
- Package header (see Package File Format - Package Header)
- Metadata index (see Package File Format - Metadata Index Section) - enables fast access to compressed blocks
- Package comment
- Digital signatures
const (
CompressionNone = 0 // No compression
CompressionZstd = 1 // Zstd compression
CompressionLZ4 = 2 // LZ4 compression
CompressionLZMA = 3 // LZMA compression
)// PackageCompressionInfo contains package compression details
type PackageCompressionInfo struct {
Type uint8 // Compression type (0-3)
IsCompressed bool // Whether package is compressed
OriginalSize int64 // Original package size before compression
CompressedSize int64 // Compressed package size
Ratio float64 // Compression ratio (0.0-1.0)
}- Signed Package Restriction: Packages with signatures cannot be compressed
- Compression Before Signing: Packages must be compressed before signing
- Header Immutability: Once compressed, the header becomes immutable
- Metadata Index Location: Metadata index is located at fixed offset 112 bytes (immediately after header) when compression is enabled
The compression API supports pluggable compression algorithms through the strategy pattern.
This section describes the CompressionStrategy interface.
// CompressionStrategy extends Strategy[T, T] for compression operations
// Both input and output are the same type T
// The Strategy.Type() method returns "compression" as the category
type CompressionStrategy[T any] interface {
Strategy[T, T] // Extends the generic Strategy interface
Compress(ctx context.Context, data T) (T, error)
Decompress(ctx context.Context, data T) (T, error)
CompressionType() CompressionType // Returns the specific compression algorithm type
Name() string
}// ByteCompressionStrategy is the concrete implementation for []byte data
type ByteCompressionStrategy interface {
CompressionStrategy[[]byte]
}// AdvancedCompressionStrategy for compression with additional validation and metrics
type AdvancedCompressionStrategy[T any] interface {
CompressionStrategy[T]
ValidateInput(ctx context.Context, data T) error
GetCompressionRatio(ctx context.Context, original T, compressed T) float64
}// StreamConfig handles streaming compression for files of any size
type StreamConfig struct {
// Basic settings
ChunkSize int64 // Size of processing chunks (0 = auto-calculate)
TempDir string // Directory for temporary files ("" = system temp)
MaxMemoryUsage int64 // Maximum memory usage (0 = auto-detect, -1 = no limit)
UseDiskBuffering bool // Use disk for intermediate buffering
CleanupTempFiles bool // Clean up temporary files after completion
ProgressCallback func(bytesProcessed int64, totalBytes int64) // Progress reporting
// Advanced settings (optional - use nil for defaults)
UseParallelProcessing bool // Enable multi-core processing (default: true)
MaxWorkers int // Maximum parallel workers (0 = auto-detect)
CompressionLevel int // Compression level (0 = auto-select, 1-22 for zstd, 1-9 for others)
UseSolidCompression bool // Use solid compression for better ratios
ResumeFromOffset int64 // Resume from specific offset (0 = start)
// Memory management enhancements
MemoryStrategy MemoryStrategy // Memory management strategy
AdaptiveChunking bool // Enable adaptive chunk sizing based on memory
BufferPoolSize int // Buffer pool size (0 = auto-calculate)
MaxTempFileSize int64 // Maximum temp file size before rotation (0 = no limit)
// Concurrency and thread safety (see api_generics.md for ConcurrencyConfig and ThreadSafetyMode)
ConcurrencyConfig *ConcurrencyConfig // Thread safety and worker management
ThreadSafetyMode ThreadSafetyMode // Thread safety guarantees
}// MemoryStrategy defines memory management approach
type MemoryStrategy int
const (
MemoryStrategyConservative MemoryStrategy = iota // Use 25% of available RAM
MemoryStrategyBalanced // Use 50% of available RAM (default)
MemoryStrategyAggressive // Use 75% of available RAM
MemoryStrategyCustom // Use MaxMemoryUsage value
)This section describes built-in compression strategy implementations.
// Zstandard compression strategy with generic support
type ZstandardStrategy[T any] struct {
level int
strategy CompressionStrategy[T]
}// LZ4 compression strategy with generic support
type LZ4Strategy[T any] struct {
level int
strategy CompressionStrategy[T]
}// LZMA compression strategy with generic support
type LZMAStrategy[T any] struct {
level int
strategy CompressionStrategy[T]
}// CompressionJob represents a unit of work for compression (extends Job)
type CompressionJob[T any] struct {
*Job[T]
CompressionType uint8
CompressionLevel int
}The compression API uses focused interfaces to provide clear separation of concerns and enable flexible composition.
// CompressionInfo provides read-only access to compression information
type CompressionInfo interface {
GetCompressionInfo(ctx context.Context) PackageCompressionInfo
IsCompressed() bool
GetCompressionType() (uint8, error) // Returns compression type, or error if not compressed
GetCompressionRatio() (float64, error) // Returns compression ratio, or error if not compressed
CanCompress() bool
}// CompressionOperations provides basic compression/decompression operations
type CompressionOperations interface {
CompressPackage(ctx context.Context, compressionType uint8) error
DecompressPackage(ctx context.Context) error
SetCompressionType(ctx context.Context, compressionType uint8) error
}// CompressionStreaming provides streaming compression for large packages
type CompressionStreaming interface {
CompressPackageStream(ctx context.Context, compressionType uint8, config *StreamConfig) error
DecompressPackageStream(ctx context.Context, config *StreamConfig) error
}// CompressionFileOperations provides file-based compression operations
type CompressionFileOperations interface {
CompressPackageFile(ctx context.Context, path string, compressionType uint8, overwrite bool) error
DecompressPackageFile(ctx context.Context, path string, overwrite bool) error
}Note: These methods compress or decompress the entire package structure. For compressing individual files within a package (without compressing the entire package), see File Compression API.
The CompressionStrategy[T] interface extends the generic Core Generic Types pattern for compression-specific operations.
CompressionStrategy[T] embeds Strategy[T, T] where both input and output are the same type.
The Process method from Strategy[T, T] can be used for compression operations, while Compress and Decompress provide more specific compression/decompression methods.
// Compression provides type-safe compression for any data type
type Compression[T any] interface {
CompressGeneric(ctx context.Context, data T, strategy CompressionStrategy[T]) (T, error)
DecompressGeneric(ctx context.Context, data T, strategy CompressionStrategy[T]) (T, error)
ValidateCompressionData(ctx context.Context, data T) error
}Cross-Reference: For the base strategy pattern, see Core Generic Types.
These methods operate on packages in memory without writing to disk.
Note: For large packages, consider using Streaming Compression Methods to avoid memory limitations.
// CompressPackage compresses package content in memory
// Compresses file entries and data separately using LZ4 for metadata and specified type for data
// Compresses file index with LZ4
// Creates metadata index for fast access (NOT header, metadata index, comment, or signatures)
// Signed packages cannot be compressed
// Returns *PackageError on failure
func (p *Package) CompressPackage(ctx context.Context, compressionType uint8) errorHandle compression/decompression of in-memory packages with separate metadata and data compression.
ctx: Context for cancellation and timeout handlingcompressionType: Compression algorithm to use for file data (1-3), metadata always uses LZ4
- Compresses FileEntry metadata individually using LZ4
- Compresses file data individually using specified compression type (Zstd, LZ4, or LZMA)
- Compresses special metadata files (types 65000-65535) with LZ4 for fast access
- Compresses file index with LZ4 as a single block
- Creates metadata index for fast access to compressed blocks
- Updates package compression state in memory
- Returns error if package is signed
- Updates package header compression flags (bits 15-8)
- Writes metadata index at fixed offset 112 bytes (immediately after header)
- Package is already signed (cannot compress signed packages)
- Invalid compression type (must be 1-3)
- Package is already compressed with different type
- Context cancellation
- Metadata index creation failure
// DecompressPackage decompresses the package in memory
// Decompresses all compressed content
// Returns *PackageError on failure
func (p *Package) DecompressPackage(ctx context.Context) errorDecompress package content in memory
ctx: Context for cancellation and timeout handling
- Decompresses all compressed content (metadata blocks, data blocks, and file index)
- Updates package compression state in memory
- Clears package header compression flags (bits 15-8)
- Removes metadata index (no longer needed when uncompressed)
- Preserves all other package data
- Package is not compressed
- Decompression failure
- Context cancellation
These methods handle compression/decompression of large packages using streaming to avoid memory limitations.
For Large Files: These methods use temporary files and chunked processing to handle files that exceed available RAM, with adaptive strategies based on configuration.
// CompressPackageStream compresses large package content using streaming
// Uses temporary files and chunked processing to handle files of any size
// Configuration determines the level of optimization and memory management
// Returns *PackageError on failure
func (p *Package) CompressPackageStream(ctx context.Context, compressionType uint8, config *StreamConfig) errorHandle compression of large packages using streaming, temporary files, and configurable optimization strategies for files of any size
ctx: Context for cancellation and timeout handlingcompressionType: Compression algorithm to use (1-3)config: Unified streaming configuration for memory management and optimization
- Uses streaming for large package content
- Creates temporary files when needed for memory management
- Compresses file entries + data + index (NOT header, comment, or signatures)
- Returns error if package is signed
- Updates package header compression flags
- Adaptive Processing: Automatically adjusts strategy based on file size and configuration
- Memory Management: Respects
StreamConfig.MaxMemoryUsageto prevent OOM - Progress Reporting: Provides progress updates for long-running operations
- Parallel Processing: Uses multiple CPU cores when enabled in configuration
- Chunked Processing: Processes files in configurable chunks (auto-calculated if 0)
- Security Errors: Package is already signed (cannot compress signed packages)
- Validation Errors: Invalid compression type, invalid stream configuration
- I/O Errors: Temporary file creation failed, insufficient disk space
- Context Errors: Context cancellation or timeout exceeded
The unified StreamConfig supports different usage patterns based on requirements:
Simple Usage (basic settings only):
config := &StreamConfig{
ChunkSize: 0, // Auto-calculate
MaxMemoryUsage: 0, // Auto-detect
TempDir: "", // System temp
}Advanced Usage (full configuration):
config := &StreamConfig{
ChunkSize: 1024 * 1024 * 1024, // 1GB chunks
MaxMemoryUsage: 8 * 1024 * 1024 * 1024, // 8GB limit
UseParallelProcessing: true,
MaxWorkers: 0, // Auto-detect
CompressionLevel: 0, // Auto-select
UseSolidCompression: true,
MemoryStrategy: MemoryStrategyBalanced,
AdaptiveChunking: true,
}// DecompressPackageStream decompresses large package content using streaming
// Uses streaming to manage memory efficiently for large packages
// Returns *PackageError on failure
func (p *Package) DecompressPackageStream(ctx context.Context, config *StreamConfig) errorDecompress large package content using streaming
ctx: Context for cancellation and timeout handlingconfig: Streaming configuration for memory management
- Uses streaming for large package content
- Decompresses all compressed content
- Updates package compression state in memory
- Clears package header compression flags
- Preserves all other package data
- Validation Errors: Package is not compressed, invalid stream configuration
- Compression Errors: Decompression operation failed, algorithm-specific failures
- I/O Errors: Streaming operation failed, insufficient disk space
- Context Errors: Context cancellation or timeout exceeded
These methods handle both compression/decompression and writing to a file.
Note: These methods compress or decompress the entire package structure. For compressing individual files within a package (FileEntry.Compress, Package.CompressFile), see File Compression API.
// CompressPackageFile compresses package content and writes to specified path
// Compresses file entries + data + index (NOT header, comment, or signatures)
// Signed packages cannot be compressed
// Returns *PackageError on failure
func (p *Package) CompressPackageFile(ctx context.Context, path string, compressionType uint8, overwrite bool) errorHandle compression AND write to file
ctx: Context for cancellation and timeout handlingpath: Target file path for compressed packagecompressionType: Compression algorithm to use (1-3)overwrite: Whether to overwrite existing file
- Compresses package content in memory
- Writes compressed package to specified path
- Creates new file by default, overwrites if
overwrite=true - Compresses file entries + data + index (NOT header, comment, or signatures)
- Returns error if package is signed
- Package is already signed
- Invalid compression type
- File already exists and
overwrite=false - I/O errors
- Context cancellation
// DecompressPackageFile decompresses the package and writes to specified path
// Decompresses all compressed content and writes uncompressed package
// Returns *PackageError on failure
func (p *Package) DecompressPackageFile(ctx context.Context, path string, overwrite bool) errorDecompress package and write to file
ctx: Context for cancellation and timeout handlingpath: Target file path for uncompressed packageoverwrite: Whether to overwrite existing file
- Decompresses package content in memory
- Writes uncompressed package to specified path
- Creates new file by default, overwrites if
overwrite=true - Decompresses all compressed content
- Package is not compressed
- File already exists and
overwrite=false - I/O errors
- Context cancellation
This section describes compression information and status operations.
See 1.3 PackageCompressionInfo Struct for the complete structure definition.
This section describes compression status methods.
This section describes package compression query methods.
// GetPackageCompressionInfo returns package compression information
func (p *Package) GetPackageCompressionInfo() PackageCompressionInfo// IsPackageCompressed checks if the package is compressed
// Checks header flags bits 15-8 for compression type
func (p *Package) IsPackageCompressed() bool// GetPackageCompressionType returns the package compression type
// Returns compression type from header flags bits 15-8
// Returns *PackageError if package is not compressed
func (p *Package) GetPackageCompressionType() (uint8, error)// GetPackageCompressionRatio returns the compression ratio
// Returns *PackageError if package is not compressed
func (p *Package) GetPackageCompressionRatio() (float64, error)// GetPackageOriginalSize returns the original size before compression
// Returns *PackageError if package is not compressed
func (p *Package) GetPackageOriginalSize() (int64, error)// GetPackageCompressedSize returns the compressed size
// Returns *PackageError if package is not compressed
func (p *Package) GetPackageCompressedSize() (int64, error)This section describes package compression control methods.
// SetPackageCompressionType sets the package compression type (without compressing)
// Returns *PackageError on failure
func (p *Package) SetPackageCompressionType(compressionType uint8) error// CanCompressPackage checks if package can be compressed (not signed)
func (p *Package) CanCompressPackage() boolThis section describes metadata index methods.
// HasMetadataIndex checks if package has metadata index (compression enabled)
// Returns true if header flags bits 15-8 != 0
func (p *Package) HasMetadataIndex() bool// GetMetadataIndexOffset returns the offset to metadata index
// Returns fixed offset 112 bytes (PackageHeaderSize) when compression enabled
// Returns *PackageError if package is not compressed (no metadata index)
func (p *Package) GetMetadataIndexOffset() (int64, error)This section describes generic compression methods.
// Generic compression methods for type-safe operations
// CompressionStrategy[T] embeds Strategy[T, T] from the generics package
// See [Core Generic Types](api_generics.md#1-core-generic-types) for base strategy pattern
func (p *Package) CompressGeneric[T any](ctx context.Context, data T, strategy CompressionStrategy[T]) (T, error)// DecompressGeneric decompresses data using a generic compression strategy.
func (p *Package) DecompressGeneric[T any](ctx context.Context, data T, strategy CompressionStrategy[T]) (T, error)// Returns *PackageError on failure
func (p *Package) ValidateCompressionData[T any](ctx context.Context, data T) errorType Constraints: The type parameter T in CompressGeneric and DecompressGeneric is typically []byte for compression operations, but can be any type that the CompressionStrategy[T] supports.
For most use cases, T should be []byte to work with data directly.
The constraint any is used because compression strategies may work with different data types (e.g., []byte, custom serializable types).
Error Handling: All compression operations return errors using NewPackageError or WrapErrorWithContext with typed error context for type-safe error handling.
See Error Handling for details.
This section describes internal compression methods.
// Internal compression methods (used by CompressPackage and Write)
// Returns *PackageError on failure
func (p *Package) compressPackageContent(ctx context.Context, compressionType uint8) error// Returns *PackageError on failure
func (p *Package) decompressPackageContent(ctx context.Context) errorThe compression API provides explicit concurrency patterns and thread safety guarantees for safe concurrent usage.
The compression API provides different levels of thread safety based on the ThreadSafetyMode configuration:
No thread safety guarantees. Operations should not be called concurrently.
Read-only operations are safe for concurrent access. Multiple goroutines can safely call read methods simultaneously.
Concurrent read/write operations are supported. Uses read-write mutex for optimal read performance.
Full thread safety with complete synchronization. All operations are protected by appropriate locking mechanisms.
The compression API uses the generic worker pool patterns defined in api_generics.md with compression-specific extensions.
// CompressionWorkerPool extends WorkerPool for compression operations
type CompressionWorkerPool[T any] struct {
*WorkerPool[T]
compressionStrategy CompressionStrategy[T]
}// Compression-specific methods
func (p *CompressionWorkerPool[T]) CompressConcurrently(ctx context.Context, data []T, strategy CompressionStrategy[T]) ([]T, error)// DecompressConcurrently decompresses multiple data items concurrently using a worker pool.
func (p *CompressionWorkerPool[T]) DecompressConcurrently(ctx context.Context, data []T, strategy CompressionStrategy[T]) ([]T, error)// GetCompressionStats returns statistics about compression operations performed by the worker pool.
func (p *CompressionWorkerPool[T]) GetCompressionStats() CompressionStatsThis section describes concurrent compression methods.
// CompressPackageConcurrent compresses package content using worker pool
// Returns *PackageError on failure
func (p *Package) CompressPackageConcurrent(ctx context.Context, compressionType uint8, config *StreamConfig) error// DecompressPackageConcurrent decompresses package content using worker pool
// Returns *PackageError on failure
func (p *Package) DecompressPackageConcurrent(ctx context.Context, config *StreamConfig) errorThe compression API uses the generic resource management patterns defined in api_generics.md with compression-specific resource types.
This section describes the CompressionResourcePool structure.
// CompressionResourcePool manages compression-specific resources
type CompressionResourcePool struct {
*ResourcePool[CompressionResource]
compressionConfig *CompressionConfig
}// Compression-specific resource management methods
func (p *CompressionResourcePool) AcquireCompressionResource(ctx context.Context, strategyType uint8) (*CompressionResource, error)// Returns *PackageError on failure
func (p *CompressionResourcePool) ReleaseCompressionResource(resource *CompressionResource) error// GetCompressionResourceStats returns statistics about compression resource usage.
func (p *CompressionResourcePool) GetCompressionResourceStats() CompressionResourceStats// CompressionResource represents a compression-specific resource
type CompressionResource struct {
ID string
Strategy CompressionStrategy[[]byte]
Buffer []byte
LastUsed time.Time
AccessCount int64
}The compression API provides compression-specific configuration patterns that extend the generic configuration patterns defined in api_generics.md.
This section describes compression-specific configuration options.
// CompressionConfig extends Config for compression-specific settings
type CompressionConfig struct {
*Config[[]byte]
// Compression-specific settings
CompressionType Option[uint8] // Compression algorithm type
CompressionLevel Option[int] // Compression level (1-22 for zstd, 1-9 for others)
UseSolidCompression Option[bool] // Use solid compression for better ratios
ResumeFromOffset Option[int64] // Resume from specific offset
MemoryStrategy Option[MemoryStrategy] // Memory management strategy
}This section describes the CompressionConfigBuilder structure.
// CompressionConfigBuilder provides fluent configuration building for compression
type CompressionConfigBuilder struct {
config *CompressionConfig
}// NewCompressionConfigBuilder creates a new compression configuration builder.
func NewCompressionConfigBuilder() *CompressionConfigBuilder// WithCompressionType sets the compression type for the configuration.
func (b *CompressionConfigBuilder) WithCompressionType(compType uint8) *CompressionConfigBuilder// WithCompressionLevel sets the compression level for the configuration.
func (b *CompressionConfigBuilder) WithCompressionLevel(level int) *CompressionConfigBuilder// WithSolidCompression enables or disables solid compression for the configuration.
func (b *CompressionConfigBuilder) WithSolidCompression(useSolid bool) *CompressionConfigBuilder// WithMemoryStrategy sets the memory strategy for the configuration.
func (b *CompressionConfigBuilder) WithMemoryStrategy(strategy MemoryStrategy) *CompressionConfigBuilder// Build constructs and returns the final compression configuration.
func (b *CompressionConfigBuilder) Build() *CompressionConfigThis section describes compression validation patterns.
This section describes the CompressionValidator structure.
// CompressionValidator provides compression-specific validation
type CompressionValidator struct {
*Validator[[]byte]
compressionRules []CompressionValidationRule
}// AddCompressionRule adds a compression validation rule to the validator.
func (v *CompressionValidator) AddCompressionRule(rule CompressionValidationRule)// Returns *PackageError on failure
func (v *CompressionValidator) ValidateCompressionData(ctx context.Context, data []byte) error// Returns *PackageError on failure
func (v *CompressionValidator) ValidateDecompressionData(ctx context.Context, data []byte) error// CompressionValidationRule represents a compression-specific validation rule
type CompressionValidationRule struct {
Name string
Predicate func([]byte) bool
Message string
}This section describes the relationship between compression and signing operations.
This section describes signing compressed packages.
Compressed packages can be signed
- Compress package content using
CompressPackageorCompressPackageFile - Sign the compressed package using signature methods
- Signatures validate the compressed content
- Faster signature validation (less data to hash during validation)
- Reduced overall package storage requirements (compressed content reduces total package size)
This section describes compressing signed packages.
Signed packages cannot be compressed
- Signatures validate specific content
- Compression would change the content being validated
- Would invalidate existing signatures
- If package is signed, decompress first
- Make changes to package
- Recompress if desired
- Re-sign the package
This section describes how to select compression strategies.
This section describes compression type selection criteria.
- Best compression ratio
- Moderate CPU usage
- Good for archival storage
- Fastest compression/decompression
- Lower compression ratio
- Good for real-time applications
- Highest compression ratio
- Highest CPU usage
- Best for long-term storage
This section provides a decision matrix for selecting compression strategies.
The following table provides guidance for manual compression type selection based on intended use case:
| Use Case | Recommended Type | Reason |
|---|---|---|
| Real-time processing | LZ4 | Speed priority |
| Archival storage | Zstandard | Balanced performance |
| Maximum compression | LZMA | Size priority |
| Network transfer | Zstandard | Good balance |
When compression is requested but compression type is not explicitly specified (compressionType = 0 or omitted), the API automatically selects the optimal compression algorithm based on package characteristics.
The automatic selection analyzes the following package properties:
- Total Package Size: Uncompressed size of all file entries, data, and index
- File Count: Number of files in the package
- File Type Distribution: Classification of files by type (text, binary, already-compressed)
- Average File Size: Total size divided by file count
- Content Compressibility: Estimated compression potential based on file types
The algorithm applies the following rules in order of priority:
-
Already-Compressed Content Detection:
- If >50% of package content consists of already-compressed formats (JPEG, PNG, GIF, MP3, MP4, OGG, FLAC), select LZ4 (fast, minimal benefit from heavy compression)
-
Small Package Optimization:
- If total package size < 10MB, select LZ4 (speed over compression ratio for small packages)
- Rationale: Compression overhead outweighs benefits for small packages
-
Many Small Files:
- If file count > 100 AND average file size < 10KB, select LZ4 (fast compression for many small files)
- Rationale: Package structure overhead makes compression ratio less important than speed
-
Large Package with Text-Heavy Content:
- If total package size > 100MB AND text-based files (text, scripts, configs) represent >60% of content, select LZMA (maximum compression for compressible content)
- Rationale: Text compresses well, large size justifies CPU cost
-
Large Package with Mixed Content:
- If total package size > 100MB AND text-based files represent 30-60% of content, select Zstandard (balanced compression)
- Rationale: Mixed content benefits from balanced approach
-
Large Package with Binary-Heavy Content:
- If total package size > 100MB AND binary files represent >60% of content, select Zstandard (good compression for binary with reasonable speed)
- Rationale: Binary doesn't compress as well, balanced approach optimal
-
Medium Package Default:
- If total package size 10MB - 100MB, select Zstandard (balanced performance for medium packages)
- Rationale: Default balanced approach for moderate sizes
-
Fallback Default:
- If no specific rules apply, select Zstandard (safe balanced default)
- Rationale: Zstandard provides good balance of speed and compression for most scenarios
For selection algorithm purposes, files are classified as:
- Text-based: Text files, scripts, configuration files, source code, JSON, XML, CSV
- Binary: Executables, compiled binaries, databases, proprietary formats
- Already-compressed: JPEG, PNG, GIF, MP3, MP4, OGG, FLAC, ZIP, GZIP, etc.
- Media: Images, audio, video (excludes already-compressed formats)
The algorithm uses SelectCompressionType logic for individual file classification where applicable.
When automatic selection is triggered:
- Selection occurs during
CompressPackage,CompressPackageFile,CompressPackageStream, orWriteoperations - Selected compression type is logged for debugging/monitoring
- User can override by explicitly specifying compressionType (1-3)
- Selection is consistent: same package properties always yield same selection
Automatic selection has minimal overhead:
- Analysis uses existing package metadata (file entries, sizes, types)
- No content scanning required beyond metadata lookup
- Selection decision is O(n) where n is file count
- Selection adds <1ms overhead for typical packages
This section describes different compression workflow options.
Compress the package content in memory first, then write the compressed package to disk.
Call CompressPackage with the desired compression type to compress the package content in memory.
Call Write with CompressionNone to write the already-compressed package to the output file without additional compression.
Compress the package content and write it to disk in a single operation.
Call CompressPackageFile with the target file path, compression type, and overwrite flag to compress and write the package in one step.
Write the package to disk with compression applied during the write operation.
Call Write with the target file path, compression type, and overwrite flag to write the package with compression applied during the write process.
Use streaming compression for large packages that may exceed available memory.
Create a StreamConfig with appropriate chunk size settings.
Set ChunkSize to a reasonable size such as 1MB for processing chunks.
Enable UseTempFiles to use temporary files for large packages that exceed memory limits.
Call CompressPackageStream with the compression type and stream configuration to compress the package using streaming.
Call Write with CompressionNone to write the compressed package to the output file.
For extremely large packages or when maximum performance is required, use advanced streaming compression with full configuration options that align with modern best practices from 7zip, zstd, and tar.
Create a StreamConfig with intelligent defaults that allow the system to auto-detect optimal values.
Set ChunkSize to 0 for automatic calculation based on available memory.
Use an empty string for TempDir to utilize the system's temporary directory.
Set MaxMemoryUsage to 0 for automatic detection based on system RAM.
Select MemoryStrategyBalanced to use 50% of available RAM for optimal performance.
Enable AdaptiveChunking to allow the system to adjust chunk size based on memory pressure.
Enable UseDiskBuffering for intermediate buffering when memory limits are reached.
Set CleanupTempFiles to true for automatic cleanup of temporary files.
Enable UseParallelProcessing for multi-core processing.
Set MaxWorkers to 0 for automatic CPU core detection.
Set CompressionLevel to 0 for automatic selection of the optimal compression level.
Enable UseSolidCompression for better compression ratios by treating multiple files as a single stream.
Set ResumeFromOffset to 0 to start from the beginning.
Set BufferPoolSize to 0 for automatic calculation of buffer pool size.
Set MaxTempFileSize to 0 for no limit on temporary file size.
Configure a ProgressCallback function to receive real-time progress updates during compression.
Call CompressPackageStream with the ZSTD compression type and the configured settings.
Write the compressed package to the output file using Write with no additional compression.
For specific memory constraints or performance requirements, configure custom memory management settings.
Set ChunkSize to a specific value such as 512MB for controlled chunk processing.
Specify a custom TempDir path for temporary file storage.
Set MaxMemoryUsage to a specific limit such as 1GB for strict memory control.
Use MemoryStrategyCustom to utilize the explicit MaxMemoryUsage value.
Disable AdaptiveChunking to prevent automatic chunk size adjustments.
Set BufferPoolSize to a specific number of buffers for predictable memory usage.
Configure MaxTempFileSize to limit individual temporary file sizes.
Enable UseParallelProcessing for multi-core utilization.
Set MaxWorkers to a specific number to limit concurrent workers.
Specify a particular CompressionLevel for consistent compression behavior.
Call CompressPackageStream with the ZSTD compression type and the custom configuration.
The compression API uses the comprehensive structured error system defined in api_core.md.
Generic Error Context: All compression error-returning functions use WrapErrorWithContext or NewPackageError with typed error context structures for type-safe error handling.
Functions like CompressPackage, DecompressPackage, CompressGeneric, and other compression operations return errors that use the generic error context helpers:
WrapErrorWithContext[T]: Wraps errors with typed context structuresNewPackageError[T]: Creates structured errors with typed contextGetErrorContext[T]: Retrieves type-safe context from errors
See Generic Error Context Helpers for complete documentation.
This section describes common error conditions in compression operations.
- Security Errors: Package is already signed (cannot compress signed packages)
- Validation Errors: Invalid compression algorithm, package already compressed with different type
- Compression Errors: Compression operation failed, algorithm-specific failures
- Validation Errors: Package is not compressed, invalid compressed data format
- Compression Errors: Decompression operation failed, algorithm-specific failures
- Corruption Errors: Compressed data is corrupted, checksum validation failed
- Validation Errors: Target file exists and overwrite=false, invalid file path
- I/O Errors: I/O operation failed, disk space insufficient
- Security Errors: Insufficient permissions, access denied
This section describes error recovery strategies for compression operations.
- Package remains in original state
- No partial compression state
- Can retry with different compression type
- Package remains compressed
- Original compressed data preserved
- Can attempt recovery or use backup
This section describes modern best practices for compression operations.
Our compression API aligns with modern best practices used by leading compression systems:
- ZSTD Streaming: Uses
ZSTD_compressStream2andZSTD_decompressStream2for large files - Memory Efficiency: Constant memory usage regardless of file size
- Real-time Processing: Enables compression of files larger than available RAM
- Progress Reporting: Industry-standard progress callbacks for user feedback
- Multi-core Utilization: Automatically detects and uses available CPU cores
- Worker Pool Management: Configurable worker count for optimal performance
- Load Balancing: Distributes chunks across workers for maximum throughput
- Memory Isolation: Each worker operates within memory limits
- Configurable Chunk Size: Default 1GB chunks, adjustable based on system resources
- Adaptive Sizing: Automatically adjusts chunk size based on available memory
- Resumable Operations: Can resume from any chunk boundary
- Progress Tracking: Real-time progress reporting per chunk
- Strict Limits: Enforces maximum memory usage to prevent OOM
- Disk Fallback: Automatic fallback to disk buffering when memory limits hit
- Temporary File Management: Intelligent temp file cleanup and management
- Buffer Pooling: Reuses buffers to minimize allocation overhead
- Intelligent Defaults: Auto-detects system capabilities and sets optimal values
- Adaptive Sizing: Automatically adjusts memory usage based on available RAM
- Memory Strategies: Conservative, Balanced, Aggressive, or Custom approaches
This section describes intelligent defaults and memory management for compression.
The system provides intelligent defaults based on system capabilities:
- Use when system has limited RAM or other processes need memory
- Default for systems with <4GB RAM
- Ensures system stability during compression
- Optimal balance between performance and system stability
- Default for systems with 4-16GB RAM
- Provides good compression speed while leaving system responsive
- Maximum performance for dedicated compression systems
- Default for systems with >16GB RAM
- Use when system is dedicated to compression tasks
- Use explicit
MaxMemoryUsagevalue - Override automatic detection
- Useful for specific memory constraints
The system automatically detects optimal memory settings based on available system resources.
The system queries available system RAM and calculates appropriate memory limits based on the selected strategy.
For systems with less than 4GB RAM, the Conservative strategy is automatically selected, allocating 25% of total RAM for compression operations.
Systems with 4-16GB RAM use the Balanced strategy by default, utilizing 50% of available RAM for optimal performance while maintaining system responsiveness.
Systems with more than 16GB RAM automatically select the Aggressive strategy, using 75% of available RAM for maximum compression performance.
When chunk size is not explicitly specified, the system calculates an optimal chunk size as 25% of the allocated memory limit.
This ensures that each processing chunk fits comfortably within the memory constraints while allowing for multiple concurrent operations.
The system automatically detects the number of available CPU cores and sets the worker count accordingly.
This enables optimal parallel processing without overloading the system with excessive worker threads.
- Memory Monitoring: Continuously monitors available memory during compression
- Dynamic Adjustment: Reduces chunk size if memory pressure detected
- Disk Fallback: Automatically switches to disk buffering when memory limits hit
- Buffer Pooling: Reuses buffers to minimize allocation overhead
- Temp File Rotation: Rotates temp files when they exceed
MaxTempFileSize
This section describes performance considerations for compression operations.
This section describes memory usage considerations for compression.
- Requires additional memory for compression buffers
- Memory usage scales with package size
- Consider streaming for very large packages
- Large Files: Use
CompressPackageStreamwith appropriate memory limits and advanced configuration - Memory Management: Automatic fallback to disk buffering when memory limits exceeded
- Requires memory for decompressed content
- May need temporary storage for large packages
- Use streaming for memory-constrained environments
- Large Files: Uses chunked decompression with temp file management
- Memory Limits: Enforces
MaxMemoryUsageto prevent system OOM
This section describes CPU usage considerations for compression.
- LZ4: Lowest CPU usage
- Zstandard: Moderate CPU usage
- LZMA: Highest CPU usage
- Generally faster than compression
- LZ4: Fastest decompression
- Zstandard: Moderate decompression speed
- LZMA: Slowest decompression
This section describes I/O considerations for compression operations.
- Use streaming for large packages
- Consider disk space requirements
- Monitor I/O performance impact
UPDATE: Removed: Out of scope.
Compressed packages transfer fasterConsider compression overhead vs. transfer timeUse appropriate compression type for network speed
This section describes the structured error system for compression operations.
The NovusPack package compression API uses a comprehensive structured error system that provides better error categorization, context, and debugging capabilities. For complete error system documentation, see Structured Error System.
This section describes common compression error types.
The NovusPack compression API uses the structured error system with the following error types:
ErrTypeCompression: Compression and decompression operation failuresErrTypeValidation: Invalid compression parameters and data validation errorsErrTypeIO: I/O errors during compression operationsErrTypeContext: Context cancellation and timeout errorsErrTypeCorruption: Corrupted compressed data errorsErrTypeUnsupported: Unsupported compression algorithms and features
This section provides examples of structured errors in compression operations.
This section describes how to create compression errors.
This section defines error context type definitions used in compression errors.
// Define error context types
type CompressionErrorContext struct {
Algorithm string
Level int
InputSize int64
Operation string
}// UnsupportedCompressionErrorContext provides error context for unsupported compression type errors.
type UnsupportedCompressionErrorContext struct {
CompressionType uint8
SupportedTypes []uint8
Operation string
}// MemoryErrorContext provides error context for memory-related compression errors.
type MemoryErrorContext struct {
RequiredMemory string
AvailableMemory string
Algorithm string
Operation string
}// Compression failure with typed context
err := NewPackageError(ErrTypeCompression, "compression failed", nil, CompressionErrorContext{
Algorithm: "Zstd",
Level: 6,
InputSize: 1024 * 1024,
Operation: "CompressPackage",
})
// Unsupported compression type with typed context
err := NewPackageError(ErrTypeUnsupported, "unsupported compression type", nil, UnsupportedCompressionErrorContext{
CompressionType: 99,
SupportedTypes: []uint8{0, 1, 2, 3},
Operation: "SetCompressionType",
})
// Memory error with typed context
err := NewPackageError(ErrTypeIO, "insufficient memory", nil, MemoryErrorContext{
RequiredMemory: "512MB",
AvailableMemory: "256MB",
Algorithm: "LZMA",
Operation: "CompressPackage",
})Use the structured error system to handle compression errors appropriately.
Check error types and extract context information for proper error handling and logging.
Handle different error categories (compression, I/O, context) with appropriate responses.