Delta Compression Guide
Complete guide to understanding and optimizing delta compression in MediaGit.
What is Delta Compression?
Delta compression stores only the differences between file versions instead of complete copies:
Traditional Storage:
v1.psd: 500 MB
v2.psd: 500 MB (full copy)
v3.psd: 500 MB (full copy)
Total: 1,500 MB
Delta Compression:
v1.psd: 500 MB (base)
v2.psd: 15 MB (delta from v1)
v3.psd: 8 MB (delta from v2)
Total: 523 MB (65% savings!)
How MediaGit Applies Delta Compression
Automatic Detection
MediaGit automatically applies delta compression based on:
- File size - Must be >10MB
- File similarity - Content similarity above threshold
- File type - Media-aware thresholds
- Savings check - Delta must be <90% of full size
Similarity Thresholds by File Type
| File Type | Threshold | Behavior |
|---|---|---|
| AI/PDF/InDesign | 0.15 | Very aggressive (compressed streams, structural similarity) |
| DOCX/XLSX/PPTX (Office) | 0.20 | Aggressive (ZIP containers, shared structure) |
| MP4/MOV (Video) | 0.50 | Moderate (metadata/timeline changes) |
| WAV/AIF (Audio) | 0.65 | Medium (clip edits) |
| PSD/JPG/PNG (Images) | 0.70 | Moderate (perceptual similarity) |
| FBX/OBJ/BLEND (3D Models) | 0.70 | Moderate (geometry changes) |
| TXT/Code | 0.85 | Conservative (small changes matter) |
| JSON/YAML/TOML (Config) | 0.95 | Very conservative (exact matches preferred) |
| Default | 0.30 | Global minimum (MIN_SIMILARITY_THRESHOLD) |
Lower threshold = more files use delta compression Higher threshold = only very similar files use delta
Checking Delta Status
Show Delta Information
# Show file storage info
$ mediagit show --stat large-file.psd
Object: 5891b5b522d5df086d...
Type: blob (delta)
Size: 15.3 MB (delta)
Base: a3c5d7e2f1b8c9a4d... (500 MB)
Compression ratio: 96.9%
Delta chain depth: 3
List Objects with Delta Info
$ mediagit stats --verbose
Object database statistics:
Total objects: 8,875
Loose objects: 247
Packed objects: 8,628
Delta statistics:
Objects with deltas: 2,847 (32%)
Average chain depth: 4.2
Max chain depth: 12
Total delta savings: 3.2 GB (78%)
Configuring Delta Compression
Global Configuration
Edit .mediagit/config:
[compression.delta]
# Enable automatic delta compression
enabled = true
# Minimum file size for delta consideration
min_size = "10MB"
# Minimum savings required (10% = 0.1)
min_savings = 0.1
# Maximum delta chain depth before creating new base
max_depth = 10
# Per-file-type similarity thresholds
[compression.delta.thresholds]
psd = 0.70 # Images (perceptual similarity)
psb = 0.70 # Large Photoshop documents
blend = 0.70 # Blender projects
fbx = 0.70 # FBX 3D models
obj = 0.70 # OBJ 3D models
wav = 0.65 # WAV audio
aif = 0.65 # AIF audio
mp4 = 0.50 # MP4 video
mov = 0.50 # QuickTime video
ai = 0.15 # Creative/PDF containers
pdf = 0.15 # PDF containers
default = 0.30 # Global minimum
Adjust Aggressiveness
# More aggressive (delta more files)
$ mediagit config set compression.delta.thresholds.default 0.65
# More conservative (fewer deltas, safer)
$ mediagit config set compression.delta.thresholds.default 0.85
# Disable delta for specific types
$ mediagit config set compression.delta.thresholds.mp4 1.0
Override for Single File
# Force delta compression
$ mediagit add --force-delta large-file.blend
# Disable delta for this file
$ mediagit add --no-delta huge-video.mp4
Optimizing Delta Chains
Understanding Delta Chains
Delta chains form when multiple versions are stored:
Base (v1) → Δ2 → Δ3 → Δ4 → Δ5
Reconstruction requires applying all deltas in sequence:
- Chain depth 1-5: Fast reconstruction
- Chain depth 6-10: Good performance
- Chain depth >10: New base created automatically
Check Chain Depth
$ mediagit verify --check-deltas
Analyzing delta chains...
Long chains detected:
assets/scene.blend: depth 52 (slow reconstruction)
images/poster.psd: depth 48
models/character.fbx: depth 45
Recommendation: Run 'mediagit gc --aggressive' to optimize chains
Optimize Chains
# Standard GC (optimizes chains >10 depth)
$ mediagit gc
# Aggressive GC (optimizes chains >20 depth)
$ mediagit gc --aggressive
# Result:
Optimizing delta chains...
Chains optimized: 23
New bases created: 23
Average depth reduced: 52 → 8
Repository size: 485 MB → 467 MB
Performance Tuning
Parallel Delta Processing
[compression.delta.performance]
# Enable parallel delta encoding
parallel = true
# Number of threads (0 = auto-detect)
threads = 0
# Chunk size for large file delta
chunk_size = "4MB"
Memory Limits
[compression.delta.memory]
# Maximum memory for delta buffers
max_buffer_size = "512MB"
# Stream large deltas (reduces memory)
streaming_threshold = "100MB"
Troubleshooting
Delta Compression Not Applied
Check file size:
$ ls -lh large-file.psd
-rw-r--r-- 1 user user 8.5M # Too small (<10MB)
Solution: Delta only applies to files >10MB by default
Check similarity:
$ mediagit show --similarity large-file.psd
Previous version similarity: 0.42 (threshold: 0.85)
Reason: File significantly changed, delta not beneficial
Solution: File rewritten, delta won’t help
Slow Reconstruction
Check delta chain depth:
$ mediagit show large-file.psd
Delta chain depth: 87 (very deep!)
Solution: Optimize chains
$ mediagit gc --aggressive
High Memory Usage
Check delta streaming:
[compression.delta.memory]
# Force streaming for large deltas
streaming_threshold = "50MB" # Lower threshold
Best Practices
1. Regular Garbage Collection
# Weekly maintenance
$ mediagit gc
# Monthly aggressive optimization
$ mediagit gc --aggressive
2. Tune for Your Workflow
Photo/Design Work (many small edits):
[compression.delta.thresholds]
psd = 0.80 # More aggressive
blend = 0.80
Video/Audio (large rewrites):
[compression.delta.thresholds]
mp4 = 1.0 # Disable delta
mov = 1.0
wav = 0.95 # Very conservative
3. Monitor Delta Effectiveness
# Check delta savings
$ mediagit stats --delta-report
Delta compression effectiveness:
File type | Files | Avg savings | Total saved
-------------+-------+-------------+-------------
PSD | 1,247 | 92.3% | 2.4 GB
BLEND | 389 | 88.7% | 876 MB
FBX | 156 | 74.2% | 234 MB
WAV | 89 | 45.1% | 89 MB
Other | 203 | 67.8% | 156 MB
-------------+-------+-------------+-------------
Total | 2,084 | 85.4% | 3.75 GB
4. Verify After Major Changes
# After configuration changes
$ mediagit verify --check-deltas
# Ensure chains are healthy
$ mediagit gc --verify
Advanced Topics
Custom Similarity Functions
For specific workflows, you can customize similarity detection (requires building from source):
#![allow(unused)]
fn main() {
// Custom similarity for your file type
fn custom_similarity(old: &[u8], new: &[u8]) -> f64 {
// Your custom similarity logic
// Return 0.0-1.0 (0 = completely different, 1 = identical)
}
}
Delta Debugging
Enable detailed delta logging:
$ RUST_LOG=mediagit_compression::delta=debug mediagit add large-file.psd
DEBUG mediagit_compression::delta: Calculating similarity...
DEBUG mediagit_compression::delta: Similarity: 0.89 (threshold: 0.85) ✓
DEBUG mediagit_compression::delta: Generating delta...
DEBUG mediagit_compression::delta: Delta size: 15.3 MB (full: 500 MB)
DEBUG mediagit_compression::delta: Savings: 96.9% ✓ (min: 10%)
DEBUG mediagit_compression::delta: Delta compression applied
Performance Benchmarks
Delta Encoding Speed
| File Size | Encoding Time | Throughput |
|---|---|---|
| 10 MB | 0.1s | 100 MB/s |
| 100 MB | 0.8s | 125 MB/s |
| 500 MB | 4.2s | 119 MB/s |
| 1 GB | 8.7s | 115 MB/s |
Reconstruction Speed
| Chain Depth | File Size | Reconstruction Time |
|---|---|---|
| 1-5 | 500 MB | 0.5s (1000 MB/s) |
| 6-10 | 500 MB | 1.2s (416 MB/s) |
| 11-20 | 500 MB | 2.8s (178 MB/s) |
| 21-50 | 500 MB | 6.5s (77 MB/s) |
| >50 | 500 MB | 15s+ (33 MB/s) |