Performance Optimization
Practical tips for maximizing MediaGit throughput and minimizing storage costs.
Parallel Add
The single biggest performance lever. By default, MediaGit uses all available CPU cores.
# Let MediaGit choose (default: all CPUs)
mediagit add assets/
# Explicit job count
mediagit add --jobs 16 assets/
# Disable parallelism (for debugging or resource-constrained systems)
mediagit add --no-parallel assets/
Expected throughput (validated benchmarks, release build):
| File type | Throughput | Notes |
|---|---|---|
| PSD (72–181 MB) | 72–119 MB/s | Zstd Best; layer data compresses well |
| MP4/MOV (5–398 MB) | 146–174 MB/s | Pre-compressed; store-mode, zero CPU overhead |
| GLB (14–25 MB) | 3.0–5.2 MB/s | GLB parser + CDC chunking + Zstd |
| WAV (55–57 MB) | 2.1–3.6 MB/s | RIFF parser + chunking (CPU-bound) |
| Pre-compressed (JPEG, USDZ) | 25–182 MB/s | Direct write, no chunking |
Compression Strategy
MediaGit automatically selects the best compression strategy per file type. You can tune the global defaults:
# .mediagit/config.toml
[compression]
algorithm = "zstd"
level = 3 # 1 (fast) → 22 (best). Default 3 is optimal for most cases.
min_size = 1024 # Don't compress files smaller than 1 KB
Format-Specific Behavior
MediaGit never wastes CPU re-compressing already-compressed formats:
| Format | Strategy | Reason |
|---|---|---|
| JPEG, PNG, WebP | Store (level 0) | Already compressed |
| MP4, MOV, AVI | Store | Already compressed |
| ZIP, DOCX, XLSX | Store | ZIP container |
| PDF, AI, InDesign | Store | Contains compressed streams |
| PSD | Zstd Best | Raw layer data compresses well |
| OBJ, FBX, GLB, STL | Zstd Best | Binary 3D data |
| WAV, FLAC | Zstd Default | Uncompressed audio |
| Text, JSON, TOML | Zstd Default | Highly compressible |
Delta Encoding
For versioned files that change incrementally (e.g., evolving PSD files), MediaGit uses delta encoding to store only the differences between versions:
# Similarity thresholds (in smart_compressor.rs — not yet configurable via TOML)
# AI/PDF files: 15% similarity → try delta encoding
# Office docs: 20% similarity → try delta encoding
# General: 80% similarity threshold
Delta chains are capped at depth 10 to prevent slow reads on deeply-chained objects.
Chunking
Large files are split into chunks for efficient deduplication and parallel transfer. MediaGit uses different chunkers per file type:
| File size / type | Chunker | Typical chunk count |
|---|---|---|
| < 10 MB | FastCDC (small) | 2–10 |
| 10–100 MB | FastCDC (medium) | 10–100 |
| > 100 MB | StreamCDC | 100–2000 |
| MP4 / MKV / WebM | Video container-aware | 1 per GOP |
| WAV | Audio-aware | Fixed-size segments |
| PSD | Layer-aware | 1 per layer group |
Deduplication: Identical chunks across files or versions are stored only once. For a 6 GB CSV dataset, this yielded 83% storage savings in testing.
Storage Backend Performance
Cloud backend upload speeds depend on network, not MediaGit:
| Backend | Upload | Download | Notes |
|---|---|---|---|
| Local filesystem | 200–500 MB/s | 200–500 MB/s | Limited by disk I/O |
| MinIO (local) | 100–300 MB/s | 200–500 MB/s | Validated: 108 MB/s upload |
| Amazon S3 | 50–200 MB/s | 100–400 MB/s | Depends on region + instance |
| Azure Blob | 50–150 MB/s | 100–300 MB/s | |
| Google Cloud Storage | 50–200 MB/s | 100–400 MB/s |
S3 Transfer Optimization
[performance]
max_concurrency = 32 # More parallel uploads
[performance.connection_pool]
max_connections = 32
[performance.timeouts]
request = 300 # 5 min for very large files
write = 120
Memory Usage
Cache settings control how much object data MediaGit keeps in memory:
[performance.cache]
enabled = true
max_size = 1073741824 # 1 GB (for large repos)
ttl = 7200 # 2 hours
For workstations with < 8 GB RAM, reduce to 256 MB:
max_size = 268435456 # 256 MB
Repository Maintenance
Garbage Collection
Run after many branch deletions or partial operations:
mediagit gc
GC removes unreferenced objects and repacks data. Safe to run any time.
Verify Integrity
# Quick check (metadata only)
mediagit fsck
# Full cryptographic verification
mediagit verify
Statistics
mediagit stats
Shows compression ratio, deduplication rate, object count, and chunk distribution by file category.
Profiling
For investigating performance bottlenecks in development:
# Enable trace-level logging
RUST_LOG=mediagit_versioning=trace mediagit add large-file.psd
# Benchmark specific operations
cargo bench --workspace -p mediagit-compression
CI/CD Performance Tips
- Cache the binary: Download once, cache with
actions/cache, skip re-download on subsequent runs - Parallel jobs: Match
--jobsto the CI runner’s CPU count (nprocon Linux) - Avoid re-verifying in CI:
mediagit fsckis fast;mediagit verifydoes full SHA-256 re-reads and is slower - Use regional buckets: Place S3 buckets in the same region as your CI runners