Home

V'Ger Backup Logo

V’Ger is a fast, encrypted, deduplicated backup tool written in Rust. It’s based on a simple YAML config format and includes a desktop GUI and webDAV server to view snapshots.

Features

Deduplication via FastCDC content-defined chunking
Compression with LZ4 (default), Zstandard, or none
Encryption with auto-selected AES-256-GCM or ChaCha20-Poly1305 and Argon2id key derivation
Storage backends via Apache OpenDAL (local filesystem, S3-compatible storage, SFTP)
YAML-based configuration with environment variable expansion
Dedicated REST server with append-only enforcement, quotas, and server-side compaction
Built-in web interface (WebDAV) to browse and restore snapshots
Rate limiting for CPU, disk I/O, and network bandwidth
Hooks for monitoring, database backups, and custom scripts
Desktop GUI (work in progress)

Inspired by

BorgBackup: architecture, chunking strategy, repository concept, and overall backup pipeline.
Borgmatic: YAML configuration approach.
Rustic: storage backend abstraction via Apache OpenDAL, pack file design, and architectural references from a mature Rust backup tool.
V’Ger from Star Trek: The Motion Picture — a probe that assimilated everything it encountered and returned as something far more powerful.

Comparison

Aspect	Borg	Restic	Rustic	V’Ger
Configuration	CLI (YAML via Borgmatic)	CLI (YAML via ResticProfile)	TOML config file	YAML config with env-var expansion
Browse snapshots	FUSE mount	FUSE mount	FUSE mount	Built-in WebDAV + web UI
Hooks	Via Borgmatic	Via ResticProfile	Native	Native (per-command before/after)
Rate limiting	None	Upload/download bandwidth	—	CPU, disk I/O, and network bandwidth
Dedicated server	SSH (`borg serve`)	rest-server (append-only)	rustic_server	REST server with append-only, quotas, server-side compaction
Desktop GUI	Vorta (third-party)	Third-party (Backrest)	None	Built-in (work in progress)
Scheduling	Via Borgmatic	Via ResticProfile	External (cron/systemd)	Built-in
Language	Python + Cython	Go	Rust	Rust
Chunker	Buzhash (custom)	Rabin	Rabin (Restic-compat)	FastCDC
Encryption	AES-CTR+HMAC / AES-OCB / ChaCha20	AES-256-CTR + Poly1305-AES	AES-256-CTR + Poly1305-AES	AES-256-GCM / ChaCha20-Poly1305 (auto-select at init)
Key derivation	PBKDF2 or Argon2id	scrypt	scrypt	Argon2id
Serialization	msgpack	JSON + Protocol Buffers	JSON + Protocol Buffers	msgpack
Storage	borgstore + SSH RPC	Local, S3, SFTP, REST, rclone	OpenDAL (local, S3, many more)	OpenDAL (local, S3, SFTP) + vger-server
Repo compatibility	Borg v1/v2/v3	Restic format	Restic-compatible	Own format

Reference

Configuration
Command Reference
Server Mode
Architecture

Quick Start

Install

Download a pre-built binary from the releases page, or build from source:

cargo build --release

The binary is at target/release/vger. See Installing for more details.

Create a config file

Generate a starter configuration in the current directory:

vger config

Or write it to a specific path:

vger config --dest ~/.config/vger/config.yaml

Edit the generated vger.yaml to set your repository path and source directories. Encryption is enabled by default. See Configuration for a full reference.

Initialize and back up

Initialize the repository (prompts for passphrase if encrypted):

vger init

Create a backup of all configured sources:

vger backup

Inspect snapshots

List all snapshots:

vger list

Show repository statistics:

vger info

List files inside a snapshot (use the hex ID from vger list):

vger list --snapshot a1b2c3d4

Restore

Restore files from a snapshot to a directory:

vger restore --snapshot a1b2c3d4 --dest /tmp/restored

For backup options, snapshot browsing, and maintenance tasks, see the workflow guides.

Installing

Pre-built binaries

Download the latest release for your platform from the releases page.

Extract the archive and place the vger binary somewhere on your PATH:

# Example for Linux/macOS
tar xzf vger-*.tar.gz
sudo cp vger /usr/local/bin/

Build from source

Requires Rust 1.88 or later.

git clone https://github.com/borgbase/vger.git
cd vger
cargo build --release

The binary is at target/release/vger. Copy it to a directory on your PATH:

cp target/release/vger /usr/local/bin/

Verify installation

vger --version

Next steps

Quick Start
Initialize and Set Up a Repository

Initialize and Set Up a Repository

Generate a configuration file

Create a starter config in the current directory:

vger config

Or write it to a specific path:

vger config --dest ~/.config/vger/config.yaml

Encryption is enabled by default (mode: "auto"). During init, vger benchmarks AES-256-GCM and ChaCha20-Poly1305, chooses one, and stores that concrete mode in the repository config. No config is needed unless you want to force a mode or disable encryption with mode: "none".

The passphrase is requested interactively at init time. You can also supply it via:

VGER_PASSPHRASE environment variable
passcommand in the config (e.g. passcommand: "pass show vger")

Configure repositories and sources

Set the repository URL and the directories to back up:

repositories:
  - url: "/backup/repo"
    label: "main"

sources:
  - "/home/user/documents"
  - "/home/user/photos"

See Configuration for all available options.

Initialize the repository

vger init

This creates the repository structure at the configured URL. For encrypted repositories, you will be prompted to enter a passphrase.

Validate

Confirm the repository was created:

vger info

Run a first backup and check results:

vger backup
vger list

Next steps

Make a Backup
Configuration

Storage Backends

V’Ger uses Apache OpenDAL for storage abstraction. The repository URL in your config determines which backend is used.

Backend	URL example	Feature flag
Local filesystem	`/backups/repo`	— (always available)
S3 / S3-compatible	`s3://bucket/prefix`	— (always available)
SFTP	`sftp://host/path`	`backend-sftp`
REST (vger-server)	`https://host/repo`	`backend-rest`

Local filesystem

Store backups on a local or mounted disk. No extra configuration needed.

repositories:
  - url: "/backups/repo"
    label: "local"

Accepted URL formats: absolute paths (/backups/repo), relative paths (./repo), or file:///backups/repo.

S3 / S3-compatible

Store backups in Amazon S3 or any S3-compatible service (MinIO, Wasabi, Backblaze B2, etc.).

AWS S3:

repositories:
  - url: "s3://my-bucket/vger"
    label: "s3"
    region: "us-east-1"                    # Default if omitted
    # access_key_id: "AKIA..."            # Optional; uses AWS SDK defaults if omitted
    # secret_access_key: "..."

S3-compatible (custom endpoint):

When the URL host contains a dot or a port, it’s treated as a custom endpoint and the first path segment is the bucket:

repositories:
  - url: "s3://minio.local:9000/my-bucket/vger"
    label: "minio"
    region: "us-east-1"
    access_key_id: "minioadmin"
    secret_access_key: "minioadmin"

S3 configuration options

Field	Description
`region`	AWS region (default: `us-east-1`)
`access_key_id`	AWS access key (falls back to AWS SDK defaults)
`secret_access_key`	AWS secret key
`endpoint`	Override the endpoint derived from the URL

SFTP

Store backups on a remote server via SFTP.

Requires building with the backend-sftp feature flag (see Building with optional backends below).

repositories:
  - url: "sftp://backup@nas.local/backups/vger"
    label: "nas"
    # sftp_key: "/home/user/.ssh/id_rsa"  # Path to private key (optional)

URL format: sftp://[user@]host[:port]/path. Default port is 22.

SFTP configuration options

Field	Description
`sftp_key`	Path to SSH private key (defaults to `~/.ssh/id_rsa`)

REST (vger-server)

Store backups on a dedicated vger-server instance via HTTP/HTTPS. The server provides append-only enforcement, quotas, lock management, and server-side compaction.

Requires building with the backend-rest feature flag (see Building with optional backends below).

repositories:
  - url: "https://backup.example.com/myrepo"
    label: "server"
    rest_token: "my-secret-token"          # Bearer token for authentication

REST configuration options

Field	Description
`rest_token`	Bearer token sent as `Authorization: Bearer <token>`

See Server Mode for how to set up and configure the server.

Building with optional backends

Local and S3 backends are always available. SFTP and REST require feature flags at build time:

# All backends
cargo build --release --features backend-sftp,backend-rest

# Just SFTP
cargo build --release --features backend-sftp

# Just REST
cargo build --release --features backend-rest

Pre-built binaries from the releases page include all backends.

Make a Backup

Run a backup

Back up all configured sources to all configured repositories:

vger backup

By default, V’Ger preserves filesystem extended attributes (xattrs). Configure this globally with xattrs.enabled, and override per source in rich sources entries.

Sources and labels

Each source in your config produces its own snapshot. When you use the rich source form, the label field gives each source a short name you can reference from the CLI:

sources:
  - path: "/home/user/documents"
    label: "docs"
  - path: "/home/user/photos"
    label: "photos"

For simple string sources (e.g. - "/home/user/documents"), the label is derived automatically from the directory name (documents).

Back up only a specific source by label:

vger backup --source docs

When targeting a specific repository, use --repo:

vger --repo local backup --source docs

Label backups

Annotate a snapshot with a label, for example before a system change:

vger backup --label before-upgrade

This is separate from source labels — it tags the resulting snapshot so you can identify it later in vger list output.

List and verify snapshots

# List all snapshots
vger list

# List the 5 most recent snapshots
vger list --last 5

# List snapshots for a specific source
vger list --source docs

# List files inside a snapshot
vger list --snapshot a1b2c3d4

Quick Start
Configuration
Restore a Backup

Restore a Backup

Locate snapshots

# List all snapshots
vger list

# List the 5 most recent snapshots
vger list --last 5

# List snapshots for a specific source
vger list --source docs

Inspect snapshot contents

# List files inside a snapshot
vger list --snapshot a1b2c3d4

Restore to a directory

# Restore all files from a snapshot
vger restore --snapshot a1b2c3d4 --dest /tmp/restored

Restore applies extended attributes (xattrs) by default. Control this with the top-level xattrs.enabled config setting.

Browse via WebDAV (mount)

Browse snapshot contents via a local WebDAV server.

# Serve all snapshots (default: http://127.0.0.1:8080)
vger mount

# Serve a single snapshot
vger mount --snapshot a1b2c3d4

# Only snapshots from a specific source
vger mount --source docs

# Custom listen address
vger mount --address 127.0.0.1:9090

Quick Start
Make a Backup

Maintenance

Delete a snapshot

# Delete a specific snapshot by ID
vger delete --snapshot a1b2c3d4

Prune old snapshots

Apply the retention policy defined in your configuration to remove expired snapshots.

vger prune

Verify repository integrity

# Structural integrity check
vger check

# Full data verification (reads and verifies every chunk)
vger check --verify-data

Compact (reclaim space)

After delete or prune, blob data remains in pack files. Run compact to rewrite packs and reclaim disk space.

# Preview what would be repacked
vger compact --dry-run

# Repack to reclaim space
vger compact

Quick Start
Server Mode (server-side compaction)
Architecture (compact algorithm details)

Configuration

V’Ger is driven by a YAML configuration file. Generate a starter config with:

vger config

Config file locations

V’Ger automatically finds config files in this order:

--config <path> flag
VGER_CONFIG environment variable
./vger.yaml (project)
$XDG_CONFIG_HOME/vger/config.yaml or ~/.config/vger/config.yaml (user)
/etc/vger/config.yaml (system)

You can also set VGER_PASSPHRASE to supply the passphrase non-interactively.

Minimal example

A complete but minimal working config. Encryption defaults to auto (init benchmarks AES-256-GCM vs ChaCha20-Poly1305 and pins the repo), so you only need repositories and sources:

repositories:
  - url: "/backup/repo"

sources:
  - "/home/user/documents"

Repositories

Local:

repositories:
  - url: "/backups/repo"
    label: "local"

S3:

repositories:
  - url: "s3://my-bucket/vger"
    label: "s3"
    region: "us-east-1"

Each entry accepts an optional label for CLI targeting (vger --repo local list) and optional pack size tuning (min_pack_size, max_pack_size). See Storage Backends for all backend-specific options.

Sources

Sources can be a simple list of paths (auto-labeled from directory name) or rich entries with per-source options.

Simple form:

sources:
  - "/home/user/documents"
  - "/home/user/photos"

Rich form (single path):

sources:
  - path: "/home/user/documents"
    label: "docs"
    exclude: ["*.tmp", ".cache/**"]
    # exclude_if_present: [".nobackup", "CACHEDIR.TAG"]
    # one_file_system: true
    # git_ignore: false
    repos: ["main"]                  # Only back up to this repo (default: all)
    retention:
      keep_daily: 7
    hooks:
      before: "echo starting docs backup"

Rich form (multiple paths):

Use paths (plural) to group several directories into a single source. An explicit label is required:

sources:
  - paths:
      - "/home/user/documents"
      - "/home/user/notes"
    label: "writing"
    exclude: ["*.tmp"]

These directories are backed up together as one snapshot. You cannot use both path and paths on the same entry.

Encryption

Encryption is enabled by default (auto mode with Argon2id key derivation). You only need an encryption section to supply a passcommand, force a specific algorithm, or disable encryption:

encryption:
  # mode: "auto"                     # Default — benchmark at init and persist chosen mode
  # mode: "aes256gcm"                # Force AES-256-GCM
  # mode: "chacha20poly1305"         # Force ChaCha20-Poly1305
  # mode: "none"                     # Disable encryption
  # passphrase: "inline-secret"      # Not recommended for production
  # passcommand: "pass show borg"    # Shell command that prints the passphrase

Compression

compression:
  algorithm: "lz4"                   # "lz4", "zstd", or "none"
  zstd_level: 3                      # Only used with zstd

Chunker

chunker:                             # Optional, defaults shown
  min_size: 524288                   # 512 KiB
  avg_size: 2097152                  # 2 MiB
  max_size: 8388608                  # 8 MiB

Exclude Patterns

exclude_patterns:                    # Global gitignore-style patterns (merged with per-source)
  - "*.tmp"
  - ".cache/**"
exclude_if_present:                  # Skip dirs containing any marker file
  - ".nobackup"
  - "CACHEDIR.TAG"
one_file_system: true                # Do not cross filesystem/mount boundaries (default true)
git_ignore: false                    # Respect .gitignore files (default false)
xattrs:                              # Extended attribute handling
  enabled: true                      # Preserve xattrs on backup/restore (default true)

Retention

retention:                           # Global retention policy (can be overridden per-source)
  keep_last: 10
  keep_daily: 7
  keep_weekly: 4
  keep_monthly: 6
  keep_yearly: 2
  keep_within: "2d"                  # Keep everything within this period (e.g. "2d", "48h", "1w")

Limits

limits:                              # Optional backup resource limits
  cpu:
    max_threads: 0                   # 0 = default rayon behavior
    nice: 0                          # Unix niceness target (-20..19), 0 = unchanged
  io:
    read_mib_per_sec: 0              # Source file reads during backup
    write_mib_per_sec: 0             # Local repository writes during backup
  network:
    read_mib_per_sec: 0              # Remote backend reads during backup
    write_mib_per_sec: 0             # Remote backend writes during backup

Hooks

hooks:                               # Global hooks: run for every command
  before: "echo starting"
  after: "echo done"
  # before_backup: "echo backup starting"  # Command-specific hooks
  # failed: "notify-send 'vger failed'"
  # finally: "cleanup.sh"

Environment Variable Expansion

Config files support environment variable placeholders in values:

repositories:
  - url: "${VGER_REPO_URL:-/backup/repo}"
    # rest_token: "${VGER_REST_TOKEN}"

Supported syntax:

${VAR}: requires VAR to be set (hard error if missing)
${VAR:-default}: uses default when VAR is unset or empty

Notes:

Expansion runs on raw config text before YAML parsing.
Variable names must match [A-Za-z_][A-Za-z0-9_]*.
Malformed placeholders fail config loading.
No escape syntax is supported for literal ${...}.

Multiple sources

Each source entry in rich form can override global settings. This lets you tailor backup behavior per directory:

sources:
  - path: "/home/user/documents"
    label: "docs"
    exclude: ["*.tmp"]
    xattrs:
      enabled: false                 # Override top-level xattrs setting for this source
    repos: ["local"]                 # Only back up to the "local" repo
    retention:
      keep_daily: 7
      keep_weekly: 4

  - path: "/home/user/photos"
    label: "photos"
    repos: ["local", "remote"]       # Back up to both repos
    retention:
      keep_daily: 30
      keep_monthly: 12
    hooks:
      after: "echo photos backed up"

Per-source fields that override globals: exclude, exclude_if_present, one_file_system, git_ignore, repos, retention, hooks.

Multiple repositories

Add more entries to repositories: to back up to multiple destinations. Top-level settings serve as defaults; each entry can override encryption, compression, retention, and limits.

repositories:
  - url: "/backups/local"
    label: "local"

  - url: "s3://bucket/remote"
    label: "remote"
    region: "us-east-1"
    encryption:
      passcommand: "pass show vger-remote"
    compression:
      algorithm: "zstd"             # Better ratio for remote
    retention:
      keep_daily: 30                 # Keep more on remote
    limits:
      cpu:
        max_threads: 2
      network:
        write_mib_per_sec: 25

When limits is set on a repository entry, it replaces top-level limits for that repository.

By default, commands operate on all repositories. Use --repo / -R to target a single one:

vger --repo local list
vger -R /backups/local list

Command Reference

Command	Description
`vger config`	Generate a starter configuration file
`vger init`	Initialize a new backup repository
`vger backup`	Back up files to a new snapshot
`vger list`	List snapshots, or files within a snapshot
`vger restore`	Restore files from a snapshot
`vger delete`	Delete a specific snapshot
`vger prune`	Prune snapshots according to retention policy
`vger check`	Verify repository integrity (`--verify-data` for full content verification)
`vger info`	Show repository statistics (snapshot counts and size totals)
`vger compact`	Free space by repacking pack files after delete/prune
`vger mount`	Browse snapshots via a local WebDAV server

Server Mode

V’Ger includes a dedicated backup server for secure, policy-enforced remote backups. TLS is handled by a reverse proxy (nginx, caddy, and similar tools).

Why a dedicated REST server instead of plain S3

Dumb storage backends (S3, WebDAV, SFTP) work well for basic backups, but they cannot enforce policy or do server-side work. vger-server adds capabilities that object storage alone cannot provide.

Capability	S3 / dumb storage	vger-server
Append-only mode	Not enforceable; a compromised client with S3 credentials can delete anything	Rejects delete and pack overwrite operations
Server-side compaction	Client must download and re-upload all live blobs	Server repacks locally on disk from a compact plan
Quota enforcement	Requires external bucket policy/IAM setup	Built-in per-repo byte quota checks on writes
Backup freshness monitoring	Requires external polling and parsing	Tracks `last_backup_at` on manifest writes
Lock auto-expiry	Advisory locks can remain after crashes	TTL-based lock cleanup in the server
Structural health checks	Client has to fetch data to verify structure	Server validates repository shape directly

All data remains client-side encrypted. The server never has the encryption key and cannot read backup contents.

Build the server

cargo build --release -p vger-server
# Binary at target/release/vger-server

Build the client with REST support

cargo build --release -p vger-cli --features vger-core/backend-rest

Server configuration

Create vger-server.toml:

[server]
listen = "127.0.0.1:8484"
data_dir = "/var/lib/vger"
token = "some-secret-token"
append_only = false              # true = reject all deletes
log_format = "pretty"           # "json" for structured logging

# Optional limits
# quota_bytes = 5368709120       # 5 GiB per-repo quota. 0 = unlimited.
# lock_ttl_seconds = 3600        # auto-expire locks after 1 hour (default)

Start the server

vger-server --config vger-server.toml

Client configuration (REST backend)

repositories:
  - url: "https://backup.example.com/myrepo"
    label: "server"
    rest_token: "some-secret-token"

encryption:
  mode: "auto"

sources:
  - "/home/user/documents"

All standard commands (init, backup, list, info, restore, delete, prune, check, compact) work over REST without CLI workflow changes.

Health check

# No auth required
curl http://localhost:8484/health

Returns server status, uptime, disk free space, and repository count.

Architecture

Technical reference for vger’s cryptographic, chunking, compression, and storage design decisions.

Cryptography

Encryption

AEAD with 12-byte random nonces (AES-256-GCM or ChaCha20-Poly1305).

Rationale:

Authenticated encryption with modern, audited constructions
auto mode benchmarks AES-256-GCM vs ChaCha20-Poly1305 at init and stores one concrete mode per repo
Strong performance across mixed CPU capabilities (AES acceleration and non-AES acceleration)
32-byte symmetric keys (simpler key management than split-key schemes)
The 1-byte type tag is passed as AAD (authenticated additional data), binding the ciphertext to its intended object type

Key Derivation

Argon2id for passphrase-to-key derivation.

Rationale:

Modern memory-hard KDF recommended by OWASP and IETF
Resists both GPU and ASIC brute-force attacks

Hashing / Chunk IDs

Keyed BLAKE2b-256 MAC using a chunk_id_key derived from the master key.

Rationale:

Prevents content confirmation attacks (an adversary cannot check whether known plaintext exists in the backup without the key)
BLAKE2b is faster than SHA-256 in software
Trade-off: keyed IDs prevent dedup across different encryption keys (acceptable for vger’s single-key-per-repo model)

Content Processing

Chunking

FastCDC (content-defined chunking) via the fastcdc v3 crate.

Default parameters: 512 KiB min, 2 MiB average, 8 MiB max (configurable in YAML).

Rationale:

Newer algorithm, benchmarks faster than Rabin fingerprinting
Good deduplication ratio with configurable chunk boundaries

Compression

Per-chunk compression with a 1-byte tag prefix. Supported algorithms: LZ4, ZSTD, and None.

Rationale:

Per-chunk tags allow mixing algorithms within a single repository
LZ4 for speed-sensitive workloads, ZSTD for better compression ratios
No repository-wide format version lock-in for compression choice

Deduplication

Content-addressed deduplication using keyed ChunkId values (BLAKE2b-256 MAC). Identical data produces the same ChunkId, so the second copy is never stored — only its refcount is incremented.

Two-level dedup check (in Repository::bump_ref_if_exists):

Committed index — the persisted ChunkIndex loaded at repo open
Pending pack writers — blobs buffered in the current data and tree PackWriter instances that haven’t been flushed yet

This two-level check prevents duplicates both across backups (via the committed index) and within a single backup run (via the pending writers). Refcounts are tracked at every level so that delete and compact can determine when a blob is truly orphaned.

Serialization

All persistent data structures use msgpack via rmp_serde. Structs serialize as positional arrays (not named-field maps) for compactness. This means field order matters — adding or removing fields requires careful versioning, and #[serde(skip_serializing_if)] must not be used on Item fields (it would break positional deserialization of existing data).

RepoObj Envelope

Every encrypted object stored in the repository is wrapped in a RepoObj envelope (repo/format.rs):

[1-byte type_tag][12-byte nonce][ciphertext + 16-byte AEAD tag]

The type tag identifies the object kind via the ObjectType enum:

Tag	ObjectType	Used for
0	Config	Repository configuration (stored unencrypted)
1	Manifest	Snapshot list
2	SnapshotMeta	Per-snapshot metadata
3	ChunkData	Compressed file/item-stream chunks
4	ChunkIndex	Chunk-to-pack mapping
5	PackHeader	Trailing header inside pack files
6	FileCache	File-level cache (inode/mtime skip)

The type tag byte is passed as AAD (authenticated additional data) to the selected AEAD mode. This binds each ciphertext to its intended object type, preventing an attacker from substituting one object type for another (e.g., swapping a manifest for a snapshot).

Repository Format

On-Disk Layout

<repo>/
|- config                    # Repository metadata (unencrypted msgpack)
|- keys/repokey              # Encrypted master key (Argon2id-wrapped)
|- manifest                  # Encrypted snapshot list
|- index                     # Encrypted chunk index
|- snapshots/<id>            # Encrypted snapshot metadata
|- packs/<xx>/<pack-id>      # Pack files containing compressed+encrypted chunks (256 shard dirs)
`- locks/                    # Advisory lock files

Key Data Structures

ChunkIndex — HashMap<ChunkId, ChunkIndexEntry>, stored encrypted at the index key. The central lookup table for deduplication, restore, and compaction.

Field	Type	Description
refcount	u32	Number of snapshots referencing this chunk
stored_size	u32	Size in bytes as stored (compressed + encrypted)
pack_id	PackId	Which pack file contains this chunk
pack_offset	u64	Byte offset within the pack file

Manifest — the encrypted snapshot list stored at the manifest key.

Field	Type	Description
version	u32	Format version (currently 1)
timestamp	DateTime	Last modification time
snapshots	Vec<SnapshotEntry>	One entry per snapshot

Each SnapshotEntry contains: name, id (32-byte random), time, source_label, label, source_paths.

SnapshotMeta — per-snapshot metadata stored at snapshots/<id>.

Field	Type	Description
name	String	User-provided snapshot name
hostname	String	Machine that created the backup
username	String	User that ran the backup
time / time_end	DateTime	Backup start and end timestamps
chunker_params	ChunkerConfig	CDC parameters used for this snapshot
item_ptrs	Vec<ChunkId>	Chunk IDs containing the serialized item stream
stats	SnapshotStats	File count, original/compressed/deduplicated sizes
source_label	String	Config label for the source
source_paths	Vec<String>	Directories that were backed up
label	String	User-provided annotation

Item — a single filesystem entry within a snapshot’s item stream.

Field	Type	Description
path	String	Relative path within the backup
entry_type	ItemType	`RegularFile`, `Directory`, or `Symlink`
mode	u32	Unix permission bits
uid / gid	u32	Owner and group IDs
user / group	Option<String>	Owner and group names
mtime	i64	Modification time (nanoseconds since epoch)
atime / ctime	Option<i64>	Access and change times
size	u64	Original file size
chunks	Vec<ChunkRef>	Content chunks (regular files only)
link_target	Option<String>	Symlink target
xattrs	Option<HashMap>	Extended attributes

ChunkRef — reference to a stored chunk, used in Item.chunks:

Field	Type	Description
id	ChunkId	Content-addressed chunk identifier
size	u32	Uncompressed (original) size
csize	u32	Stored size (compressed + encrypted)

Pack Files

Chunks are grouped into pack files (~32 MiB) instead of being stored as individual files. This reduces file count by 1000x+, critical for cloud storage costs (fewer PUT/GET ops) and filesystem performance (fewer inodes).

Pack File Format

[8B magic "VGERPACK\0"][1B version=1]
[4B blob_0_len LE][blob_0_data]
[4B blob_1_len LE][blob_1_data]
...
[4B blob_N_len LE][blob_N_data]
[encrypted_header][4B header_length LE]

Per-blob length prefix (4 bytes): enables forward scanning to recover individual blobs even if the trailing header is corrupted
Each blob is a complete RepoObj envelope: [1B type_tag][12B nonce][ciphertext+16B AEAD tag]
Each blob is independently encrypted (can read one chunk without decrypting the whole pack)
Header at the END allows streaming writes without knowing final header size
Header is encrypted as pack_object(ObjectType::PackHeader, msgpack(Vec<PackHeaderEntry>))
Pack ID = unkeyed BLAKE2b-256 of entire pack contents, stored at packs/<shard>/<hex_pack_id>

Data Packs vs Tree Packs

Two separate PackWriter instances:

Data packs — file content chunks. Dynamic target size.
Tree packs — item-stream metadata. Fixed at min(min_pack_size, 4 MiB) since metadata is small and read frequently.

Dynamic Pack Sizing

Pack sizes grow with repository size. Config exposes floor and ceiling:

repositories:
  - path: /backups/repo
    min_pack_size: 33554432     # 32 MiB (floor, default)
    max_pack_size: 536870912    # 512 MiB (ceiling, default)

Data pack sizing formula:

target = clamp(min_pack_size * sqrt(num_data_packs / 100), min_pack_size, max_pack_size)

Data packs in repo	Target pack size
< 100	32 MiB (floor)
1,000	~101 MiB
10,000	~320 MiB
30,000+	512 MiB (cap)

num_data_packs is computed at open() by counting distinct pack_id values in the ChunkIndex (zero extra I/O).

Data Flow

Backup Pipeline

walk sources (walkdir + exclude filters)
  → for each file: check file cache (device, inode, mtime, ctime, size)
    → [cache hit + all chunks in index] reuse cached ChunkRefs, bump refcounts
    → [cache miss] FastCDC content-defined chunking
      → for each chunk: compute ChunkId (keyed BLAKE2b-256)
        → dedup check (committed index + pending pack writers)
          → [new chunk] compress (LZ4/ZSTD) → encrypt (selected AEAD mode) → buffer into PackWriter
          → [dedup hit] increment refcount, skip storage
        → when PackWriter reaches target size → flush pack to packs/<shard>/<id>
  → serialize Item to msgpack → append to item stream buffer
    → when buffer reaches ~128 KiB → chunk as tree pack
→ flush remaining packs
→ build SnapshotMeta (with item_ptrs referencing tree pack chunks)
→ store SnapshotMeta at snapshots/<id>
→ update Manifest
→ save_state() (flush packs → persist manifest + index, save file cache locally)

Restore Pipeline

open repository → load Manifest → find snapshot by name
  → load SnapshotMeta from snapshots/<id>
    → read item_ptrs chunks (tree packs) → deserialize Vec<Item>
      → sort: directories first, then symlinks, then files
        → for each directory: create dir, set permissions
        → for each symlink: create symlink
        → for each file:
          → for each ChunkRef: read blob from pack → decrypt → decompress
          → write concatenated content to disk
          → restore permissions and mtime

Item Stream

Snapshot metadata (the list of files, directories, and symlinks) is not stored as a single monolithic blob. Instead:

Items are serialized one-by-one as msgpack and appended to an in-memory buffer
When the buffer reaches ~128 KiB, it is chunked and stored as a tree pack chunk (with a finer CDC config: 32 KiB min / 128 KiB avg / 512 KiB max)
The resulting ChunkId values are collected into item_ptrs in the SnapshotMeta

This design means the item stream benefits from deduplication — if most files are unchanged between backups, the item-stream chunks are mostly identical and deduplicated away. It also avoids a memory spike from materializing all items at once.

Operations

Locking

Client-side advisory locks prevent concurrent mutating operations on the same repository.

Lock files are stored at locks/<timestamp>-<uuid>.json
Each lock contains: hostname, PID, and acquisition timestamp
Oldest-key-wins: after writing its lock, a client lists all locks — if its key isn’t lexicographically first, it deletes its own lock and returns an error
Stale cleanup: locks older than 6 hours are automatically removed before each acquisition attempt
Commands that lock: backup, delete, prune, compact
Read-only commands (no lock): list, extract, check, info

When using a vger server, server-managed locks with TTL replace client-side advisory locks (see Server Architecture).

Refcount Lifecycle

Chunk refcounts track how many snapshots reference each chunk, driving the dedup → delete → compact lifecycle:

Backup — store_chunk() adds a new entry with refcount=1, or increments an existing entry’s refcount on dedup hit
Delete / Prune — ChunkIndex::decrement() decreases the refcount; entries reaching 0 are removed from the index
Orphaned blobs — after delete/prune, the encrypted blob data remains in pack files (the index no longer points to it, but the bytes are still on disk)
Compact — rewrites packs to reclaim space from orphaned blobs

This design means delete is fast (just index updates), while space reclamation is deferred to compact.

Compact

After delete or prune, chunk refcounts are decremented and entries with refcount 0 are removed from the ChunkIndex — but the encrypted blob data remains in pack files. The compact command rewrites packs to reclaim this wasted space.

Algorithm

Phase 1 — Analysis (read-only):

Enumerate all pack files across 256 shard dirs (packs/00/ through packs/ff/)
Read each pack’s trailing header to get Vec<PackHeaderEntry>
Classify each blob as live (exists in ChunkIndex at matching pack+offset) or dead
Compute unused_ratio = dead_bytes / total_bytes per pack
Filter packs where unused_ratio >= threshold (default 10%)

Phase 2 — Repack: For each candidate pack (most wasteful first, respecting --max-repack-size cap):

If all blobs are dead → delete the pack file directly
Otherwise: read live blobs as encrypted passthrough (no decrypt/re-encrypt cycle)
Write into a new pack via a standalone PackWriter, flush to storage
Update ChunkIndex entries to point to the new pack_id/offset
save_state() — persist index before deleting old pack (crash safety)
Delete old pack file

Crash Safety

The index never points to a deleted pack. Sequence: write new pack → save index → delete old pack. A crash between steps leaves an orphan old pack (harmless, cleaned up on next compact).

CLI

vger compact [--threshold 10] [--max-repack-size 2G] [-n/--dry-run]

Parallel Pipeline

During backup, the compress+encrypt phase runs in parallel using rayon:

For each file, all chunks are classified as existing (dedup hit) or new
New chunks are collected into a batch of TransformJob structs
The batch is processed via rayon::par_iter — each job compresses and encrypts independently
Results are inserted sequentially into the PackWriter (maintaining offset ordering)

This pattern keeps the critical section (pack writer insertion + index updates) single-threaded while parallelizing the CPU-heavy work.

Configuration:

limits:
  cpu:
    max_threads: 4              # rayon thread pool size (0 = rayon default, all cores)
    nice: 10                    # Unix nice value for the backup process
  io:
    read_mib_per_sec: 100       # disk read rate limit (0 = unlimited)

Server Architecture

vger includes a dedicated backup server (vger-server) for features that dumb storage (S3/WebDAV) cannot provide. The server stores data on its local filesystem, and TLS is handled by a reverse proxy. All data remains client-side encrypted — the server is opaque storage that understands repo structure but never has the encryption key.

vger CLI (client)        reverse proxy (TLS)     vger-server
       │                       │                       │
       │──── HTTPS ───────────►│──── HTTP ────────────►│
       │                       │                       │──► local filesystem

Crate layout

Component	Location	Purpose
vger-server	`crates/vger-server/`	axum HTTP server with all server-side features
RestBackend	`crates/vger-core/src/storage/rest_backend.rs`	`StorageBackend` impl over HTTP (behind `backend-rest` feature)

REST API

Storage endpoints map 1:1 to the StorageBackend trait:

Method	Path	Maps to	Notes
`GET`	`/{repo}/{*path}`	`get(key)`	`200` + body or `404`. With `Range` header → `get_range` (returns `206`).
`HEAD`	`/{repo}/{*path}`	`exists(key)`	`200` (with Content-Length) or `404`
`PUT`	`/{repo}/{*path}`	`put(key, data)`	Raw bytes body. `201`/`204`. Rejected if over quota.
`DELETE`	`/{repo}/{*path}`	`delete(key)`	`204` or `404`. Rejected with `403` in append-only mode.
`GET`	`/{repo}/{*path}?list`	`list(prefix)`	JSON array of key strings
`POST`	`/{repo}/{*path}?mkdir`	`create_dir(key)`	`201`

Admin endpoints:

Method	Path	Description
`POST`	`/{repo}?init`	Create repo directory scaffolding (256 shard dirs, etc.)
`POST`	`/{repo}?batch-delete`	Body: JSON array of keys to delete
`POST`	`/{repo}?repack`	Server-side compaction (see below)
`GET`	`/{repo}?stats`	Size, object count, last backup timestamp, quota usage
`GET`	`/{repo}?verify-structure`	Structural integrity check (pack magic, shard naming)
`GET`	`/`	List all repos
`GET`	`/health`	Uptime, disk space, version (unauthenticated)

Lock endpoints:

Method	Path	Description
`POST`	`/{repo}/locks/{id}`	Acquire lock (body: `{"hostname": "...", "pid": 123}`)
`DELETE`	`/{repo}/locks/{id}`	Release lock
`GET`	`/{repo}/locks`	List active locks

Authentication

Single shared bearer token, constant-time compared via the subtle crate. Configured in vger-server.toml:

[server]
listen = "127.0.0.1:8484"
data_dir = "/var/lib/vger"
token = "some-secret-token"

GET /health is the only unauthenticated endpoint.

Append-Only Enforcement

When append_only = true:

DELETE on any path → 403 Forbidden
PUT to existing packs/** keys → 403 (no overwriting pack files)
PUT to manifest, index → allowed (updated every backup)
batch-delete → 403
repack with delete_after: true → 403

This prevents a compromised client from destroying backup history.

Quota Enforcement

Per-repo storage quota (quota_bytes in config). Server tracks total bytes per repo (initialized by scanning data_dir on startup, updated on PUT/DELETE). When a PUT would exceed the limit → 413 Payload Too Large.

Backup Freshness Monitoring

The server detects completed backups by observing PUT /{repo}/manifest (always the last write in a backup). Updates last_backup_at timestamp, exposed via the stats endpoint:

{
  "total_bytes": 1073741824,
  "total_objects": 234,
  "total_packs": 42,
  "last_backup_at": "2026-02-11T14:30:00Z",
  "quota_bytes": 5368709120,
  "quota_used_bytes": 1073741824
}

Lock Management with TTL

Server-managed locks replace advisory JSON lock files:

Locks are held in memory with a configurable TTL (default 1 hour)
A background task (tokio interval, every 60 seconds) removes expired locks
Prevents orphaned locks from crashed clients

Server-Side Compaction (Repack)

The key feature that justifies a custom server. Pack files that have high dead-blob ratios are repacked server-side, avoiding multi-gigabyte downloads over the network.

How it works (no encryption key needed):

Pack files contain encrypted blobs. Compaction does encrypted passthrough — it reads blobs by offset and repacks them without decrypting.

Client opens repo, downloads and decrypts the index (small)
Client analyzes pack headers to identify live vs dead blobs (via range reads)

Client sends POST /{repo}?repack with a plan:

{
  "operations": [
    {
      "source_pack": "packs/ab/ab01cd02...",
      "keep_blobs": [
        {"offset": 9, "length": 4096},
        {"offset": 8205, "length": 2048}
      ],
      "delete_after": true
    }
  ]
}

Server reads live blobs from disk, writes new pack files (magic + version + length-prefixed blobs, no trailing header), deletes old packs
Server returns new pack keys and blob offsets so the client can update its index
Client writes the encrypted pack header separately, updates ChunkIndex, calls save_state

For packs with keep_blobs: [], the server simply deletes the pack.

Structural Integrity Check

GET /{repo}?verify-structure checks (no encryption key needed):

Required files exist (config, manifest, index, keys/repokey)
Pack files follow <2-char-hex>/<64-char-hex> shard pattern
No zero-byte packs (minimum valid = magic 9 bytes + header length 4 bytes = 13 bytes)
Pack files start with VGERPACK\0 magic bytes
Reports stale lock count, total size, and pack counts

Full content verification (decrypt + recompute chunk IDs) stays client-side via vger check --verify-data.

Server Configuration

[server]
listen = "127.0.0.1:8484"
data_dir = "/var/lib/vger"
token = "some-secret-token"
append_only = false
log_format = "json"              # "json" or "pretty"

# Optional limits
# quota_bytes = 0                # per-repo quota. 0 = unlimited.
# lock_ttl_seconds = 3600        # default lock TTL

RestBackend (Client Side)

crates/vger-core/src/storage/rest_backend.rs implements StorageBackend using ureq (sync HTTP client, behind backend-rest feature flag). Connection-pooled. Maps each trait method to the corresponding HTTP verb. get_range sends a Range: bytes=<start>-<end> header and expects 206 Partial Content. Also exposes extra methods beyond the trait: batch_delete(), repack(), acquire_lock(), release_lock(), stats().

Client config:

repositories:
  - url: https://backup.example.com/myrepo
    label: server
    rest_token: "secret-token-here"

Feature Status

Implemented

Feature	Description
Pack files	Chunks grouped into ~32 MiB packs with dynamic sizing, separate data/tree packs
Retention policies	`keep_daily`, `keep_weekly`, `keep_monthly`, `keep_yearly`, `keep_last`, `keep_within`
delete command	Remove individual snapshots, decrement refcounts
prune command	Apply retention policies, remove expired snapshots
check command	Structural integrity + optional `--verify-data` for full content verification
Type-safe PackId	Newtype for pack file identifiers with `storage_key()`
compact command	Rewrite packs to reclaim space from orphaned blobs after delete/prune
REST server	axum-based backup server with auth, append-only, quotas, freshness tracking, lock TTL, server-side compaction
REST backend	`StorageBackend` over HTTP with range-read support (behind `backend-rest` feature)
Parallel pipeline	`rayon` for chunk compress/encrypt pipeline
File-level cache	inode/mtime/ctime skip for unchanged files — avoids read, chunk, compress, encrypt. Stored locally in the platform cache dir (macOS: `~/Library/Caches/vger/<repo_id>/filecache`, Linux: `~/.cache/vger/…`) — machine-specific, not in the repo.

Planned / Not Yet Implemented

Feature	Description	Priority
Type-safe IDs	Newtypes for `SnapshotId`, `ManifestId`	Medium
Snapshot filtering	By host, tag, path, date ranges	Medium
Async I/O	Non-blocking storage operations	Medium
Metrics	Prometheus/OpenTelemetry	Low

Keyboard shortcuts

V'Ger Backup Documentation