Core Concepts
MediaGit-Core is built on several fundamental concepts that enable efficient version control for large media files.
Content-Addressable Storage (CAS)
What is CAS?
Content-addressable storage identifies objects by their content (via cryptographic hash) rather than by name or location.
How MediaGit Uses CAS
File Content → SHA-256 Hash → Object ID (OID)
"hello" → 5891b5b522... → objects/58/91b5b522...
Every object (blob, tree, commit) is stored under its SHA-256 hash:
- Blobs: Raw file content
- Trees: Directory listings
- Commits: Snapshots with metadata
Benefits
- Automatic Deduplication: Identical files stored only once
- Integrity Verification: Hash mismatch = corruption detected
- Distributed Sync: Objects identifiable across repositories
- Efficient Transfers: Only send missing objects
Object Model
MediaGit uses a four-object model inspired by Git:
graph LR
Commit[Commit Object] --> Tree[Tree Object]
Tree --> Blob1[Blob: file1.psd]
Tree --> Blob2[Blob: file2.mp4]
Tree --> SubTree[Tree: subfolder/]
SubTree --> Blob3[Blob: file3.wav]
Commit --> Parent[Parent Commit]
style Commit fill:#e1f5ff
style Tree fill:#fff4e1
style Blob1 fill:#e8f5e9
style Blob2 fill:#e8f5e9
style Blob3 fill:#e8f5e9
Blob Objects
- Purpose: Store raw file content
- Properties: No filename, no metadata, just bytes
- Example:
large-file.psd→ compressed bytes
Tree Objects
- Purpose: Represent directories
- Contains: List of blobs and sub-trees with filenames and modes
- Example:
100644 blob a3c5d... README.md 100644 blob f7e2a... large-file.psd 040000 tree b8f3c... assets/
Commit Objects
- Purpose: Snapshot of repository at a point in time
- Contains:
- Root tree hash
- Parent commit(s) hash(es)
- Author signature (name, email, timestamp)
- Committer signature
- Commit message
- Example:
tree f3a5d... parent e8c2b... author Alice <alice@example.com> 1699564800 committer Alice <alice@example.com> 1699564800 Add updated PSD with new layers
Reference Objects
- Purpose: Human-readable names for commits
- Types:
refs/heads/main→ points to latest commit on main branchrefs/tags/v1.0→ points to tagged release commitHEAD→ symbolic ref to current branch
Delta Encoding
MediaGit uses zstd dictionary-based delta encoding to store only differences between versions:
When Deltas Are Used
- Scenario: Large file with small changes
- Strategy: Store base version + delta to new version
- Benefit: Significant storage savings (33–83% reduction depending on format; validated March 2026)
Delta Chain Example
Version 1: project.psd (100 MB) → Stored as full blob
Version 2: project.psd (100 MB) → Stored as delta from v1 (5 MB)
Version 3: project.psd (100 MB) → Stored as delta from v2 (3 MB)
Total storage: 108 MB instead of 300 MB (64% reduction)
Delta Chain Limits
- Maximum chain depth: 10 (
MAX_DELTA_DEPTH) - After depth exceeded, next version stored as new base
mediagit gcoptimizes long chains
Compression Strategy
MediaGit employs intelligent compression based on file type:
Compression Algorithms
- zstd (default): Fastest, good ratio for all file types
- brotli: Better ratio for text/code, slower
- delta: Zstd dictionary compression (chunk-level)
Automatic Selection
.psd, .psb → zstd level 3 (preserve layers)
.mp4, .mov → store (already compressed)
.txt, .md → brotli level 6 (high text compression)
.blend → zstd + delta (frequently updated 3D scenes)
Compression Levels
- Fast: Quick compression, lower ratio (zstd level 1)
- Default: Balanced speed/ratio (zstd level 3)
- Best: Maximum compression, slower (zstd level 19)
Branching Model
MediaGit supports lightweight branches similar to Git:
Branch Storage
- Branches are just files in
refs/heads/ - Each file contains a commit hash (64 hex characters for SHA-256)
- Creating a branch = writing a 40-byte file (instant)
Branch Visualization
gitGraph
commit id: "Initial commit"
commit id: "Add base assets"
branch feature-new-character
checkout feature-new-character
commit id: "Character model v1"
commit id: "Character textures"
checkout main
commit id: "Update README"
checkout feature-new-character
commit id: "Character animations"
checkout main
merge feature-new-character
commit id: "Release v1.0"
Branch Protection
- Branches can be marked as protected
- Protected branches require merge requests
- Prevents force-push and deletion
Merge Strategies
MediaGit provides multiple merge strategies:
1. Fast-Forward Merge
- When: Target branch is direct ancestor
- Action: Just move branch pointer forward
- Result: Linear history, no merge commit
- Use: Feature branches with no conflicts
2. Three-Way Merge
- When: Branches have diverged
- Action: Find LCA (Lowest Common Ancestor), apply both changesets
- Result: Merge commit with two parents
- Use: Concurrent work on different files
3. Media-Aware Merge
- When: Merging structured media (PSD, video, audio)
- Action: Parse file format, merge layers/tracks/channels
- Result: Merged media file preserving structure
- Use: Collaborative media editing
4. Rebase
- When: Want linear history
- Action: Replay commits on top of target branch
- Result: No merge commit, clean history
- Use: Preparing feature branch for merge
Conflict Resolution
Text Conflicts
Layer 1: Blue Background
Media Conflicts
MediaGit detects conflicting layers in PSD files:
Conflict in large-file.psd:
- Layer "Background" modified in both branches
- Your version: Blue (#0000FF)
- Their version: Red (#FF0000)
Resolution options:
1. Keep yours (blue)
2. Keep theirs (red)
3. Manual merge (open in Photoshop)
Storage Abstraction
MediaGit separates storage interface from implementation:
graph TB
App[Application Code] --> Trait[Backend Trait]
Trait --> Local[LocalBackend]
Trait --> S3[S3Backend]
Trait --> Azure[AzureBackend]
Trait --> GCS[GCSBackend]
Trait --> B2[B2Backend]
Trait --> MinIO[MinIOBackend]
Trait --> Spaces[SpacesBackend]
style Trait fill:#e1f5ff
Backend Trait
#![allow(unused)]
fn main() {
#[async_trait]
pub trait Backend: Send + Sync {
async fn get(&self, key: &str) -> Result<Vec<u8>>;
async fn put(&self, key: &str, data: &[u8]) -> Result<()>;
async fn exists(&self, key: &str) -> Result<bool>;
async fn delete(&self, key: &str) -> Result<()>;
async fn list(&self, prefix: &str) -> Result<Vec<String>>;
}
}
Benefits
- Testability: Mock backends for unit tests
- Flexibility: Swap backends without code changes
- Extensibility: Add new backends by implementing trait
Garbage Collection
Over time, unreachable objects accumulate (orphaned by branch deletion, rebases, etc.).
What Gets Collected
- Objects not reachable from any branch or tag
- Dangling blobs from incomplete operations
- Long delta chains (recompressed)
GC Process
- Mark Phase: Traverse from all refs, mark reachable objects
- Sweep Phase: Delete unmarked objects
- Repack Phase: Optimize delta chains, recompress
Safety
- Preserves recent objects (default: 2 weeks grace period)
- Dry-run mode to preview deletions
- Backup recommended before aggressive GC
Repository Structure
.mediagit/
├── config # Repository configuration
├── HEAD # Current branch pointer
├── refs/
│ ├── heads/ # Branch pointers
│ │ ├── main
│ │ └── feature-branch
│ └── tags/ # Tag pointers
│ └── v1.0
├── objects/ # Object database (CAS)
│ ├── 5a/
│ │ └── 91b5b522... # Blob object
│ └── f3/
│ └── a5d3c1e8... # Tree object
└── logs/ # Reflog (operation history)
└── HEAD
Performance Considerations
Object Packing
- Small objects stored individually
- Large objects chunked for streaming
- Frequently accessed objects cached in memory
Network Optimization
- Object transfer uses HTTP/2 multiplexing
- Parallel object fetch (configurable workers)
- Automatic retry with exponential backoff
Disk I/O
- Async I/O prevents thread blocking
- Memory-mapped files for large objects
- Sequential writes for optimal SSD performance