Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

File Formats

Internal file format reference for MediaGit’s on-disk data structures.

Repository Layout

<repo-root>/
└── .mediagit/
    ├── HEAD              # Current branch or commit pointer
    ├── config.toml       # Repository configuration (TOML)
    ├── objects/          # Content-addressable object database
    │   ├── <xx>/         # Two-character prefix directories
    │   │   └── <hash>    # Object files (remaining 62 hex chars of SHA-256)
    │   └── pack/         # Pack files (future)
    ├── refs/             # Reference storage
    │   └── heads/        # Branch refs
    │       └── main      # Branch pointer files
    ├── manifests/        # Chunk manifests per committed file
    │   └── <hash>.bin    # Bincode-serialized ChunkManifest
    └── stats/            # Operation statistics (non-critical)
        └── <timestamp>.json

Object Format

All objects are stored content-addressably. The object’s SHA-256 hash (over the uncompressed content) is used as the key. The key is split into a 2-character directory prefix and 62-character filename:

objects/ab/cdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890cd

Object Types

TypeDescription
blobFile content (full or delta)
treeDirectory listing: maps filenames to object hashes
commitCommit metadata: tree hash, parent hashes, author, message
chunkA content chunk from a chunked large file

Object Storage

Objects are stored compressed (Zstd) or uncompressed (Store), depending on the file type. The compression strategy is selected automatically per file extension:

Format classStrategy
JPEG, PNG, WebP, MP4, MOV, ZIP, docx, PDF, AIStore (no compression)
PSD, 3D models (OBJ, FBX, GLB, STL, PLY)Zstd Best
WAV, FLACZstd Default
Text, JSON, TOML, CSVZstd Default

Delta objects reference a base object and store only the difference.


Chunk Manifests

Large files are split into content-addressable chunks. The mapping of file → ordered list of chunks is stored as a ChunkManifest serialized with postcard (compact binary format):

.mediagit/manifests/<content-hash>.bin

The manifest contains:

  • File path
  • Total file size
  • Ordered list of chunks, each with:
    • Chunk hash (SHA-256 of chunk content)
    • Chunk offset in the original file
    • Chunk size (uncompressed)
    • Whether the chunk is stored as full or delta

Reference Format

Refs are plain text files containing a 64-character hex SHA-256 hash followed by a newline:

.mediagit/refs/heads/main
a3c8f9d2e1b4f6a8c5d7e9f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1

HEAD contains either:

  • A symbolic ref (pointing to a branch): ref: refs/heads/main
  • A detached commit hash: a3c8f9d2e1b4f6a8c5d7e9f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1

Configuration Format

.mediagit/config.toml is a standard TOML file. See Configuration Reference for all supported keys.

[author]
name = "Alice Smith"
email = "alice@example.com"

[storage]
backend = "filesystem"
base_path = "./data"

[compression]
enabled = true
algorithm = "zstd"
level = 3

Commit Object Format

Commits are stored as Bincode-serialized structs containing:

FieldTypeDescription
tree[u8; 32]SHA-256 hash of the root tree object
parentsVec<[u8; 32]>Parent commit hashes (0 for initial, 1+ for merges)
authorstringAuthor name
emailstringAuthor email
timestampi64Unix timestamp (seconds)
messagestringCommit message

Tree Object Format

Trees are stored as Bincode-serialized ordered lists of entries:

FieldTypeDescription
namestringFilename (not full path)
hash[u8; 32]SHA-256 hash of the blob or subtree
is_treebooltrue for subdirectory, false for file
sizeu64Uncompressed size in bytes

Statistics Format

Operation statistics are written as JSON to .mediagit/stats/:

{
  "operation": "add",
  "timestamp": "2026-02-20T10:30:00Z",
  "files_processed": 42,
  "bytes_input": 1073741824,
  "bytes_stored": 157286400,
  "chunks_created": 512,
  "chunks_deduplicated": 87,
  "duration_ms": 4320
}

These files are informational only and can be deleted without affecting repository integrity.


Hashing

MediaGit uses SHA-256 for all content hashing:

  • Object identity: SHA-256 of the uncompressed object content
  • Chunk identity: SHA-256 of the uncompressed chunk content
  • Commit hash: SHA-256 of the serialized commit object

See Also