Skip to content

Compression

ZIGX uses Zstandard (zstd) compression for optimal compression ratios and speed.

Compression Algorithm

Overview

ZIGX uses Zstandard (zstd), developed by Facebook/Meta:

  • Industry-leading compression - Better ratios than gzip/deflate
  • Extremely fast decompression - 139+ MB/s on typical data
  • Flexible compression levels - 1 (fastest) to 22 (best ratio)
  • Cross-platform - Via zstd.zig Zig bindings

Key Features

  • Zstd Frame Format: Built-in content size and checksums
  • Compression Levels 1-22: Fine-grained control over speed vs ratio
  • CRC32 Checksums: Additional integrity verification for ZIGX payload
  • SHA-256 Hashes: File-level integrity in archive headers
  • Dictionary Support: Train dictionaries for similar files
  • Long-Distance Matching: Better compression for large files
  • Adaptive Compression: Auto-detect content type and optimize settings
  • Progress Callbacks: Track compression progress for large archives

Compression Levels

ZIGX offers multiple compression modes mapped to zstd levels:

ULTRA (zstd Level 22)

zig
.level = .ultra
  • Ratio: ~17-22% of original (78-83% space saved)
  • Speed: Slowest (~5 MB/s compression)
  • Use Case: Maximum compression, archival storage

Maximum compression with ultra-deep search.

BEST (zstd Level 19)

zig
.level = .best
  • Ratio: ~19-25% of original (75-81% space saved)
  • Speed: Slow (~14 MB/s compression)
  • Use Case: Distribution packages, long-term storage

Maximum practical compression with extensive search.

BALANCED (zstd Level 6)

zig
.level = .balanced
  • Ratio: ~21-26% of original (74-79% space saved)
  • Speed: Moderate (~80 MB/s compression)
  • Use Case: Good balance between speed and ratio

Good middle ground between fast and best compression.

DEFAULT (zstd Level 3)

zig
.level = .default
  • Ratio: ~21-28% of original (72-79% space saved)
  • Speed: Balanced (~120 MB/s compression)
  • Use Case: General purpose, most applications

Zstd's default level - excellent balance between ratio and speed.

FAST (zstd Level 1)

zig
.level = .fast
  • Ratio: ~25-33% of original (67-75% space saved)
  • Speed: Fastest compression (~130 MB/s)
  • Use Case: Development, CI/CD pipelines

Prioritizes speed while still achieving good compression.

STORE (No compression)

zig
.level = .none
  • Ratio: 100%+ (header overhead only)
  • Speed: Instant
  • Use Case: Already compressed files, archives

No compression, just packages files with headers.

All 22 Levels

For fine-grained control, use .level_1 through .level_22 or CompressionLevel.custom(n):

Level RangeDescriptionSpeedRatio
1-3FastVery fastGood
4-9BalancedModerateBetter
10-15HighSlowVery Good
16-19BestVery slowExcellent
20-22UltraExtremely slowMaximum

Custom Compression Levels (1-22)

Use CompressionLevel.custom(n) to specify any zstd level from 1-22:

zig
const zigx = @import("zigx");

// Use any zstd level
const result = try zigx.bundle(.{
    .allocator = allocator,
    .include = &.{"src"},
    .output_path = "bundle.zigx",
    .level = zigx.CompressionLevel.custom(10),  // zstd level 10
});

// Using preset configurations
const config = zigx.configWithLevel(15);  // Config with custom level 15
const config_ldm = zigx.configWithLevelAndLdm(18);  // Level 18 + LDM

// Get raw level value
const level = zigx.CompressionLevel.best;
const raw_value = level.toInt();  // Returns 19

Custom Level Guidelines

zstd LevelSpeedRatioBest For
1-3★★★★★★★☆☆☆Speed priority, real-time, CI/CD
4-9★★★★☆★★★☆☆General purpose, balanced
10-15★★★☆☆★★★★☆Good compression, reasonable speed
16-19★★☆☆☆★★★★★High compression, distribution
20-22★☆☆☆☆★★★★★Maximum compression, archival

Advanced Features

Adaptive Compression

Automatically detect content type and select optimal settings:

zig
// Let ZIGX analyze and choose optimal settings
const compressed = try zigx.compressDataAdaptive(data, allocator);

// Or get analysis first
const analysis = zigx.analyzeCompressibility(data);
if (analysis.is_likely_compressed) {
    // Use store mode for already-compressed data
    const result = try zigx.bundle(.{
        .allocator = allocator,
        .level = .none,
        // ...
    });
}

Content Type Detection

ZIGX can detect content types to optimize compression:

Content TypeDetectionRecommended
Source codeKeywords, patterns.best
Text/ConfigASCII ratio.best
JSON/XMLMagic bytes.best
ImagesPNG/JPEG headers.none
ArchivesZIP/GZ headers.none
ExecutablesELF/PE headers.default

Dictionary Compression

Train dictionaries from sample data for better compression of similar files:

zig
// Train dictionary from sample files
var samples = [_][]const u8{ log1, log2, log3 };
var dict = try zigx.Dictionary.train(&samples, 32768, allocator);
defer dict.deinit();

// Save for reuse
try dict.save("logs.dict");

// Use in compression
const opts = zigx.AdvancedOptions{
    .level = .best,
    .dictionary = &dict,
};

Best for: Log files, config files, JSON documents, similar structured data.

Long-Distance Matching

For large files with repeated patterns far apart:

zig
const opts = zigx.AdvancedOptions{
    .level = .best,
    .long_distance_matching = true,
    .window_log = 25,  // 32MB window
};

Best for: Large log files, database dumps, backup archives.

Progress Tracking

Monitor compression progress with detailed events:

zig
fn onProgress(info: zigx.ProgressInfo, ctx: ?*anyopaque) void {
    _ = ctx;
    switch (info.event) {
        .scanning => std.debug.print("Scanning files...\n", .{}),
        .reading_file => {
            if (info.current_file) |file| {
                std.debug.print("\rReading: {s}", .{file});
            }
        },
        .compressing => {
            std.debug.print("\rCompressing... {d:.1}%", .{info.getPercent()});
        },
        .writing => std.debug.print("\rWriting archive...", .{}),
        .finalizing => std.debug.print("\rFinalizing...", .{}),
    }
}

const result = try zigx.bundle(.{
    .allocator = allocator,
    .include = &.{"src"},
    .output_path = "bundle.zigx",
    .progress_callback = onProgress,
    .progress_context = null,
});

Progress Events

EventDescription
scanningScanning directories for files
reading_fileReading a file from disk
compressingCompressing data with zstd
writingWriting compressed data to archive
finalizingWriting header and checksums

Preset Configurations

Use preset configurations for common scenarios:

zig
// Quick presets (returns CompressionConfig)
const fast_config = zigx.configFast();        // Level 1
const balanced_config = zigx.configBalanced(); // Level 6
const best_config = zigx.configBest();         // Level 19
const ultra_config = zigx.configUltra();       // Level 22

// Custom level presets
const level_config = zigx.configWithLevel(15);          // Any level 1-22
const ldm_config = zigx.configWithLevelAndLdm(18);      // Level + LDM

// Scenario-specific presets
const archival = zigx.configForArchiving();      // Ultra + LDM
const large_files = zigx.configForLargeFiles();  // Best + LDM + 32MB window
const distribution = zigx.configForDistribution(); // Best, optimized for packages

// Adaptive compression
const adaptive = zigx.configAdaptive();  // Auto-detect content type

Preset Configuration Table

PresetLevelLDMWindowBest For
configFast()1DefaultSpeed priority
configBalanced()6DefaultGeneral purpose
configBest()19DefaultDistribution
configUltra()22128MBMaximum compression
configAdaptive()AutoDefaultMixed content
configForLargeFiles()632MBLarge files
configForArchiving()19DefaultLong-term storage
configForDistribution()19DefaultPackage releases
configWithLevel(n)nDefaultCustom level
configWithLevelAndLdm(n)nDefaultCustom + LDM

ConfigBuilder Pattern

Build custom configurations with the fluent builder API:

zig
var builder = zigx.ConfigBuilder.init();
const cfg = builder
    .compressionLevel(.best)
    .adaptive(true)
    .longDistanceMatching(true)
    .windowLog(25)  // 32MB window
    .threads(4)
    .verbose(true)
    .build();

ConfigBuilder Methods

MethodDescription
compressionLevel(level)Set compression level
customLevel(n)Set custom level (1-22)
compressionEnabled(bool)Enable/disable compression
adaptive(bool)Enable adaptive compression
longDistanceMatching(bool)Enable LDM
windowLog(?u5)Set window size (10-31)
excludePatterns([]const []const u8)Set exclude patterns
includeHidden(bool)Include hidden files
threads(u8)Set thread count (0=auto)
verbose(bool)Enable verbose output
build()Build final Config

Compression Comparison

Benchmark results on typical project files:

ModeSize (bytes)RatioSpace Saved
BEST30,14219.3%80.7%
DEFAULT33,35121.4%78.6%
FAST39,34625.2%74.8%
STORE157,833101.3%-1.3%

By Data Type

Data TypeCompression RatioNotes
Text/Source~18-19%Excellent
Log filesUp to 99.9%Outstanding (repetitive)
Random/Encrypted~0%Incompressible
Mixed/Binary~0-18%Varies

Algorithm Versioning

ZIGX tracks compression algorithm versions to ensure compatibility:

zig
const info = try zigx.getArchiveInfo("archive.zigx", allocator);
std.debug.print("Compression Version: v{d}\n", .{info.compression_version});

Version 1 (Current)

  • Zstandard (zstd) compression
  • Levels 1-19 via zstd.c.ZSTD_compress()
  • Built-in frame format with content size
  • CRC32 payload checksums

Best Practices

Choose the Right Level

ScenarioRecommended LevelWhy
Release builds.bestMaximum compression
Daily builds.defaultGood ratio, fast
CI/CD pipelines.fastSpeed priority
Pre-compressed files.noneAvoid overhead

File Type Considerations

Some file types don't compress well:

  • Already compressed: .zip, .gz, .zst, .png, .jpg, .mp4
  • Encrypted files: Random byte distribution
  • Random binary data: No patterns to compress

For these, zstd will automatically detect incompressibility and store with minimal overhead.

Performance Tips

  1. Use .default for most cases - zstd level 3 is well-optimized
  2. Reserve .best for final releases - Much slower but best ratio
  3. Use .fast in development - Quick iteration cycles
  4. Batch similar files - Better compression on similar content

API Example

zig
const zigx = @import("zigx");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Compare compression levels using bundle() alias
    const levels = [_]zigx.CompressionLevel{ .best, .default, .fast, .none };
    
    for (levels) |level| {
        const result = try zigx.bundle(.{
            .allocator = allocator,
            .include = &.{"src"},
            .output_path = "test.zigx",
            .level = level,
        });
        defer result.deinit();
        
        std.debug.print("{s}: {d} bytes ({d:.1}% - saved {d:.1}%)\n", .{
            level.name(),
            result.archive_size,
            result.getCompressionRatio() * 100,
            result.getCompressionPercent(),
        });
    }
}

Comparison with Other Formats

FormatTypical RatioCompression SpeedDecompression Speed
ZIGX19-25%117+ MB/s139+ MB/s
ZIP (deflate)60-70%MediumMedium
GZIP60-70%MediumMedium
7-Zip (LZMA)70-80%SlowSlow
LZ450-60%Very FastVery Fast
Zstd65-75%FastVery Fast

ZIGX uses zstd internally, achieving excellent ratios with fast decompression.

Next Steps

Released under the Apache License 2.0.