Core Concepts

MediaGit-Core is built on several fundamental concepts that enable efficient version control for large media files.

Content-Addressable Storage (CAS)

What is CAS?

Content-addressable storage identifies objects by their content (via cryptographic hash) rather than by name or location.

How MediaGit Uses CAS

File Content → SHA-256 Hash → Object ID (OID)
   "hello"   →   5891b5b522...  →  objects/58/91b5b522...

Every object (blob, tree, commit) is stored under its SHA-256 hash:

Blobs: Raw file content
Trees: Directory listings
Commits: Snapshots with metadata

Benefits

Automatic Deduplication: Identical files stored only once
Integrity Verification: Hash mismatch = corruption detected
Distributed Sync: Objects identifiable across repositories
Efficient Transfers: Only send missing objects

Object Model

MediaGit uses a four-object model inspired by Git:

graph LR
    Commit[Commit Object] --> Tree[Tree Object]
    Tree --> Blob1[Blob: file1.psd]
    Tree --> Blob2[Blob: file2.mp4]
    Tree --> SubTree[Tree: subfolder/]
    SubTree --> Blob3[Blob: file3.wav]
    Commit --> Parent[Parent Commit]

    style Commit fill:#e1f5ff
    style Tree fill:#fff4e1
    style Blob1 fill:#e8f5e9
    style Blob2 fill:#e8f5e9
    style Blob3 fill:#e8f5e9

Blob Objects

Purpose: Store raw file content
Properties: No filename, no metadata, just bytes
Example: large-file.psd → compressed bytes

Tree Objects

Purpose: Represent directories
Contains: List of blobs and sub-trees with filenames and modes

Example:

100644 blob a3c5d... README.md
100644 blob f7e2a... large-file.psd
040000 tree b8f3c... assets/

Commit Objects

Purpose: Snapshot of repository at a point in time
Contains:
- Root tree hash
- Parent commit(s) hash(es)
- Author signature (name, email, timestamp)
- Committer signature
- Commit message

Example:

tree f3a5d...
parent e8c2b...
author Alice <alice@example.com> 1699564800
committer Alice <alice@example.com> 1699564800

Add updated PSD with new layers

Reference Objects

Purpose: Human-readable names for commits
Types:
- refs/heads/main → points to latest commit on main branch
- refs/tags/v1.0 → points to tagged release commit
- HEAD → symbolic ref to current branch

Delta Encoding

MediaGit uses zstd dictionary-based delta encoding to store only differences between versions:

When Deltas Are Used

Scenario: Large file with small changes
Strategy: Store base version + delta to new version
Benefit: Significant storage savings (33–83% reduction depending on format; validated March 2026)

Delta Chain Example

Version 1: project.psd (100 MB) → Stored as full blob
Version 2: project.psd (100 MB) → Stored as delta from v1 (5 MB)
Version 3: project.psd (100 MB) → Stored as delta from v2 (3 MB)

Total storage: 108 MB instead of 300 MB (64% reduction)

Delta Chain Limits

Maximum chain depth: 10 (MAX_DELTA_DEPTH)
After depth exceeded, next version stored as new base
mediagit gc optimizes long chains

Compression Strategy

MediaGit employs intelligent compression based on file type:

Compression Algorithms

zstd (default): Fastest, good ratio for all file types
brotli: Better ratio for text/code, slower
delta: Zstd dictionary compression (chunk-level)

Automatic Selection

.psd, .psb → zstd level 3 (preserve layers)
.mp4, .mov → store (already compressed)
.txt, .md  → brotli level 6 (high text compression)
.blend     → zstd + delta (frequently updated 3D scenes)

Compression Levels

Fast: Quick compression, lower ratio (zstd level 1)
Default: Balanced speed/ratio (zstd level 3)
Best: Maximum compression, slower (zstd level 19)

Branching Model

MediaGit supports lightweight branches similar to Git:

Branch Storage

Branches are just files in refs/heads/
Each file contains a commit hash (64 hex characters for SHA-256)
Creating a branch = writing a 40-byte file (instant)

Branch Visualization

gitGraph
    commit id: "Initial commit"
    commit id: "Add base assets"
    branch feature-new-character
    checkout feature-new-character
    commit id: "Character model v1"
    commit id: "Character textures"
    checkout main
    commit id: "Update README"
    checkout feature-new-character
    commit id: "Character animations"
    checkout main
    merge feature-new-character
    commit id: "Release v1.0"

Branch Protection

Branches can be marked as protected
Protected branches require merge requests
Prevents force-push and deletion

Merge Strategies

MediaGit provides multiple merge strategies:

1. Fast-Forward Merge

When: Target branch is direct ancestor
Action: Just move branch pointer forward
Result: Linear history, no merge commit
Use: Feature branches with no conflicts

2. Three-Way Merge

When: Branches have diverged
Action: Find LCA (Lowest Common Ancestor), apply both changesets
Result: Merge commit with two parents
Use: Concurrent work on different files

3. Media-Aware Merge

When: Merging structured media (PSD, video, audio)
Action: Parse file format, merge layers/tracks/channels
Result: Merged media file preserving structure
Use: Collaborative media editing

4. Rebase

When: Want linear history
Action: Replay commits on top of target branch
Result: No merge commit, clean history
Use: Preparing feature branch for merge

Conflict Resolution

Text Conflicts

Layer 1: Blue Background

Media Conflicts

MediaGit detects conflicting layers in PSD files:

Conflict in large-file.psd:
  - Layer "Background" modified in both branches
  - Your version: Blue (#0000FF)
  - Their version: Red (#FF0000)

Resolution options:
  1. Keep yours (blue)
  2. Keep theirs (red)
  3. Manual merge (open in Photoshop)

Storage Abstraction

MediaGit separates storage interface from implementation:

graph TB
    App[Application Code] --> Trait[Backend Trait]
    Trait --> Local[LocalBackend]
    Trait --> S3[S3Backend]
    Trait --> Azure[AzureBackend]
    Trait --> GCS[GCSBackend]
    Trait --> B2[B2Backend]
    Trait --> MinIO[MinIOBackend]
    Trait --> Spaces[SpacesBackend]

    style Trait fill:#e1f5ff

Backend Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Backend: Send + Sync {
    async fn get(&self, key: &str) -> Result<Vec<u8>>;
    async fn put(&self, key: &str, data: &[u8]) -> Result<()>;
    async fn exists(&self, key: &str) -> Result<bool>;
    async fn delete(&self, key: &str) -> Result<()>;
    async fn list(&self, prefix: &str) -> Result<Vec<String>>;
}
}

Benefits

Testability: Mock backends for unit tests
Flexibility: Swap backends without code changes
Extensibility: Add new backends by implementing trait

Garbage Collection

Over time, unreachable objects accumulate (orphaned by branch deletion, rebases, etc.).

What Gets Collected

Objects not reachable from any branch or tag
Dangling blobs from incomplete operations
Long delta chains (recompressed)

GC Process

Mark Phase: Traverse from all refs, mark reachable objects
Sweep Phase: Delete unmarked objects
Repack Phase: Optimize delta chains, recompress

Safety

Preserves recent objects (default: 2 weeks grace period)
Dry-run mode to preview deletions
Backup recommended before aggressive GC

Repository Structure

.mediagit/
├── config                  # Repository configuration
├── HEAD                    # Current branch pointer
├── refs/
│   ├── heads/             # Branch pointers
│   │   ├── main
│   │   └── feature-branch
│   └── tags/              # Tag pointers
│       └── v1.0
├── objects/               # Object database (CAS)
│   ├── 5a/
│   │   └── 91b5b522...   # Blob object
│   └── f3/
│       └── a5d3c1e8...   # Tree object
└── logs/                  # Reflog (operation history)
    └── HEAD

Performance Considerations

Object Packing

Small objects stored individually
Large objects chunked for streaming
Frequently accessed objects cached in memory

Network Optimization

Object transfer uses HTTP/2 multiplexing
Parallel object fetch (configurable workers)
Automatic retry with exponential backoff

Disk I/O

Async I/O prevents thread blocking
Memory-mapped files for large objects
Sequential writes for optimal SSD performance

Keyboard shortcuts

MediaGit-Core Documentation