Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Optimization

Practical tips for maximizing MediaGit throughput and minimizing storage costs.

Parallel Add

The single biggest performance lever. By default, MediaGit uses all available CPU cores.

# Let MediaGit choose (default: all CPUs)
mediagit add assets/

# Explicit job count
mediagit add --jobs 16 assets/

# Disable parallelism (for debugging or resource-constrained systems)
mediagit add --no-parallel assets/

Expected throughput (validated benchmarks, release build):

File typeThroughputNotes
PSD (72–181 MB)72–119 MB/sZstd Best; layer data compresses well
MP4/MOV (5–398 MB)146–174 MB/sPre-compressed; store-mode, zero CPU overhead
GLB (14–25 MB)3.0–5.2 MB/sGLB parser + CDC chunking + Zstd
WAV (55–57 MB)2.1–3.6 MB/sRIFF parser + chunking (CPU-bound)
Pre-compressed (JPEG, USDZ)25–182 MB/sDirect write, no chunking

Compression Strategy

MediaGit automatically selects the best compression strategy per file type. You can tune the global defaults:

# .mediagit/config.toml
[compression]
algorithm = "zstd"
level = 3      # 1 (fast) → 22 (best). Default 3 is optimal for most cases.
min_size = 1024  # Don't compress files smaller than 1 KB

Format-Specific Behavior

MediaGit never wastes CPU re-compressing already-compressed formats:

FormatStrategyReason
JPEG, PNG, WebPStore (level 0)Already compressed
MP4, MOV, AVIStoreAlready compressed
ZIP, DOCX, XLSXStoreZIP container
PDF, AI, InDesignStoreContains compressed streams
PSDZstd BestRaw layer data compresses well
OBJ, FBX, GLB, STLZstd BestBinary 3D data
WAV, FLACZstd DefaultUncompressed audio
Text, JSON, TOMLZstd DefaultHighly compressible

Delta Encoding

For versioned files that change incrementally (e.g., evolving PSD files), MediaGit uses delta encoding to store only the differences between versions:

# Similarity thresholds (in smart_compressor.rs — not yet configurable via TOML)
# AI/PDF files: 15% similarity → try delta encoding
# Office docs: 20% similarity → try delta encoding
# General: 80% similarity threshold

Delta chains are capped at depth 10 to prevent slow reads on deeply-chained objects.

Chunking

Large files are split into chunks for efficient deduplication and parallel transfer. MediaGit uses different chunkers per file type:

File size / typeChunkerTypical chunk count
< 10 MBFastCDC (small)2–10
10–100 MBFastCDC (medium)10–100
> 100 MBStreamCDC100–2000
MP4 / MKV / WebMVideo container-aware1 per GOP
WAVAudio-awareFixed-size segments
PSDLayer-aware1 per layer group

Deduplication: Identical chunks across files or versions are stored only once. For a 6 GB CSV dataset, this yielded 83% storage savings in testing.

Storage Backend Performance

Cloud backend upload speeds depend on network, not MediaGit:

BackendUploadDownloadNotes
Local filesystem200–500 MB/s200–500 MB/sLimited by disk I/O
MinIO (local)100–300 MB/s200–500 MB/sValidated: 108 MB/s upload
Amazon S350–200 MB/s100–400 MB/sDepends on region + instance
Azure Blob50–150 MB/s100–300 MB/s
Google Cloud Storage50–200 MB/s100–400 MB/s

S3 Transfer Optimization

[performance]
max_concurrency = 32  # More parallel uploads

[performance.connection_pool]
max_connections = 32

[performance.timeouts]
request = 300  # 5 min for very large files
write = 120

Memory Usage

Cache settings control how much object data MediaGit keeps in memory:

[performance.cache]
enabled = true
max_size = 1073741824  # 1 GB (for large repos)
ttl = 7200             # 2 hours

For workstations with < 8 GB RAM, reduce to 256 MB:

max_size = 268435456  # 256 MB

Repository Maintenance

Garbage Collection

Run after many branch deletions or partial operations:

mediagit gc

GC removes unreferenced objects and repacks data. Safe to run any time.

Verify Integrity

# Quick check (metadata only)
mediagit fsck

# Full cryptographic verification
mediagit verify

Statistics

mediagit stats

Shows compression ratio, deduplication rate, object count, and chunk distribution by file category.

Profiling

For investigating performance bottlenecks in development:

# Enable trace-level logging
RUST_LOG=mediagit_versioning=trace mediagit add large-file.psd

# Benchmark specific operations
cargo bench --workspace -p mediagit-compression

CI/CD Performance Tips

  • Cache the binary: Download once, cache with actions/cache, skip re-download on subsequent runs
  • Parallel jobs: Match --jobs to the CI runner’s CPU count (nproc on Linux)
  • Avoid re-verifying in CI: mediagit fsck is fast; mediagit verify does full SHA-256 re-reads and is slower
  • Use regional buckets: Place S3 buckets in the same region as your CI runners

See Also