- Overview
- Goals & Motivation
- Key Concepts
- Quickstart
- Installation & Build
- Configuration
- CLI Commands & Usage Reference
- Snapshot Lifecycle
- Deduplication & CAS Internals
- Identity & Authentication
- Peer Management
- PubSub Message Formats & Validation
- Restore Workflow
- Testing
- Docker & Orchestration
- Utility C Tool
- Shell Helpers & Entry Point
- Example File Layout After Run
- Troubleshooting & Common Issues
- Protocol Buffers
- Security Considerations
- Extension Points / Developer Notes
- Contributing
- Glossary
- License
ShadowVault is a privacy-preserving, decentralized backup agent written in Go. It snapshots filesystem data, chunks and deduplicates content, encrypts everything client-side, and synchronizes encrypted chunks and metadata across a peer-to-peer network using libp2p. There is no trusted central server: peers gossip what blocks they have, fetch missing pieces directly, and validate integrity and authenticity through signatures.
- Privacy-first backups: All data is encrypted locally before storage or exchange. No peer can read your data without the passphrase.
- Deduplicated storage: Content-addressed chunking avoids redundant uploads across snapshots.
- Decentralized sync: Data is propagated via peer-to-peer gossip and direct fetch; no single point of failure.
- Verifiable history: Snapshots are signed; block provenance is trackable.
- Resilience: Peers can fetch missing chunks from multiple holders; auto discovery and NAT handling improve reachability.
- Go: Implementation language for the main agent, CLI, P2P, and snapshot logic.
- libp2p: Used for peer discovery, pubsub gossip, direct streams (block requests), NAT traversal, and optional relaying.
- bbolt: Embedded key-value store for metadata (snapshots, peer list, block indices).
- Content Addressed Storage (CAS): Chunks are stored/encrypted and addressed by their SHA-256 hash.
- CLI: Commands for daemon, snapshot creation, restore, and peer management.
- AES-256-GCM: Authenticated encryption for chunk and snapshot payload confidentiality/integrity.
- Argon2id or scrypt: Password-based key derivation for master encryption key (depending on chosen implementation).
- Ed25519 signatures: Snapshot and protocol messages are signed to authenticate origin and prevent tampering.
- Persistent identity keys: Libp2p private key persisted encrypted for stable peer identity.
- Gossip (PubSub): Announcements of new snapshots and available block hashes.
- Direct block fetch: If a peer lacks a chunk, it opens a libp2p stream to a known holder and requests it.
- Anti-entropy: Peers reconcile missing pieces by observing announcements and querying.
- ACLs: Optional admin lists controlling who can introduce peers or snapshots.
sequenceDiagram
participant User
participant Daemon as "ShadowVault Daemon"
participant Identity as "Identity Store (Ed25519 keys / ACL)"
participant Chunker
participant Dedup as "Deduplicator"
participant Encryptor as "AES-256-GCM Encryptor"
participant LocalCAS as "Local CAS (encrypted blobs)"
participant MetaDB as "Metadata DB (bbolt)"
participant PubSub as "PubSub / Gossip"
participant Peer as "Remote Peer"
participant Restore as "Restore Agent"
participant Decryptor as "Decryptor"
participant Reconstructor as "File Reconstructor"
%% Snapshot creation
User->>Daemon: request snapshot of directory
Daemon->>Identity: load persistent identity & ACL
Daemon->>Chunker: chunk files (content-defined)
Chunker->>Dedup: send chunk hashes
Dedup->>LocalCAS: check existing encrypted chunks
alt chunk missing locally
Dedup->>Encryptor: encrypt chunk with derived key
Encryptor->>LocalCAS: store encrypted chunk (SHA-256 address)
else chunk already present
Dedup-->>LocalCAS: reuse existing blob
end
Daemon->>MetaDB: assemble snapshot metadata (chunk list, parent, timestamps)
Daemon->>Identity: sign snapshot metadata with Ed25519
Daemon->>MetaDB: persist signed snapshot descriptor
Daemon->>PubSub: publish SnapshotAnnouncement
Daemon->>PubSub: publish BlockAnnounce for available chunk hashes
%% Peer synchronization
PubSub->>Peer: receive SnapshotAnnouncement + BlockAnnounces
Peer->>Identity: verify snapshot signature against ACL/public key
alt signature valid and allowed
Peer->>LocalCAS: check which announced chunks are missing
alt has missing chunks
Peer->>Peer: open direct libp2p stream to known holder\nrequest specific chunk
Peer->>LocalCAS: if requested, send encrypted chunk
Peer-->>Peer: receive encrypted chunk
Peer->>LocalCAS: store received encrypted chunk
end
Peer->>MetaDB: update block availability index / peer cache
else invalid announcement
Peer-->>PubSub: ignore / log rejection
end
%% Restore workflow
User->>Restore: request restore of snapshot ID
Restore->>MetaDB: fetch snapshot metadata
Restore->>Identity: verify snapshot signature
Restore->>LocalCAS: for each chunk in snapshot, check local presence
alt chunk present
LocalCAS-->>Restore: provide encrypted chunk
else chunk missing
Restore->>PubSub: query gossip for holders
Restore->>Peer: direct fetch chunk via libp2p stream
Peer-->>Restore: send encrypted chunk
Restore->>LocalCAS: cache fetched encrypted chunk
end
Restore->>Decryptor: decrypt chunk(s) using derived key
Decryptor->>Reconstructor: supply plaintext pieces
Reconstructor->>User: reassemble original files with metadata
# Build and start daemon, create identity and default config, snapshot a directory
./entrypoint.sh config.yaml /path/to/important/data
# List known peers
./bin/peerctl -c config.yaml -p "yourpass" list
# Add a peer (multiaddr)
./bin/peerctl add /ip4/1.2.3.4/tcp/9000/p2p/<peerID> -c config.yaml -p "yourpass"
# Restore a snapshot
./bin/restore-agent restore <snapshot-id> restored/ -c config.yaml -p "yourpass"
- Go 1.21+
- GCC (for the auxiliary C tool)
- Docker (optional, for containerized run)
- Make
git clone <repo-url> shadowvault
cd shadowvault
make build
This produces:
bin/backup-agent
β main daemon/snapshot CLIbin/restore-agent
β snapshot restore CLIbin/peerctl
β peer management CLI
make test
Primary configuration lives in config.yaml
(created automatically by entrypoint.sh
if absent). Example:
repository_path: "./data"
listen_port: 9000
peer_bootstrap:
- "/ip4/127.0.0.1/tcp/9001/p2p/QmSomePeerID"
nat_traversal:
enable_auto_relay: true
enable_hole_punching: true
snapshot:
min_chunk_size: 2048
max_chunk_size: 65536
avg_chunk_size: 8192
acl:
admins:
- "base64-ed25519-pubkey..."
Defaults are applied when fields are missing.
# Start daemon
./bin/backup-agent daemon -c config.yaml -p "passphrase"
# Take snapshot of a directory
./bin/backup-agent snapshot /path/to/dir -c config.yaml -p "passphrase"
# Restore snapshot by ID to target directory
./bin/restore-agent restore <snapshot-id> <target-dir> -c config.yaml -p "passphrase"
# List stored/known peers
./bin/peerctl list -c config.yaml -p "passphrase"
# Add a peer by multiaddr
./bin/peerctl add /ip4/1.2.3.4/tcp/9000/p2p/<peerID> -c config.yaml -p "passphrase"
# Remove a stored peer
./bin/peerctl remove <peerID> -c config.yaml -p "passphrase"
Flags:
-c, --config
path toconfig.yaml
-p, --pass
encryption passphrase
- Chunking: Files in the target directory are read with content-defined chunking (configurable min/avg/max) to produce variable-sized pieces.
- Deduplication: Each chunk is hashed (SHA-256) and if already present locally, skipped.
- Encryption: Chunks are encrypted with AES-256-GCM using a key derived from the user passphrase.
- Storage: Encrypted chunks are stored in CAS (via bbolt or on-disk object layout).
- Snapshot metadata: A snapshot descriptor listing chunk hashes, parent snapshot (optional), timestamps, and provenance is assembled and signed.
- Announcement: Signed snapshot and block availability are gossip-published to peers via pubsub.
- Chunk Identification: SHA-256 of encrypted chunk used as content address.
- Storage: Chunks stored under
objects/<first-two>/<rest>
or via key-value bucket. - Snapshot Metadata: Includes chunk list, parent link, signer public key, signature, and arbitrary metadata (e.g., source path).
- Garbage Collection: Not automaticβimplement reference counting or periodic pruning in extensions.
- Persistent Identity: Libp2p private key is created once and saved at
repository_path/identity.key
. - Snapshot Signatures: Snapshots are signed with Ed25519 (embedded in
versioning.Snapshot.Signature
) and verified before acceptance. - ACL: Admin public keys (base64) control who may perform peer introductions or snapshot promotion.
- Stored in metadata DB (
bbolt
) under peers bucket. - Peers can be added manually with
peerctl add
or auto-discovered via DHT/rendezvous if enabled. - Peer removal cleans stored records but does not retroactively invalidate past data (chunks remain).
Core message envelope used in gossip:
{
"type": "snapshot_announce" | "block_announce" | "peer_add" | "peer_remove",
"payload": { /* type-specific struct */ },
"sig": "<base64 signature over type||payload>",
"pubkey": "<base64-ed25519 public key of signer>"
}
- SnapshotAnnouncement: Carries a full signed snapshot descriptor; peers validate the embedded signature before storing.
- BlockAnnounce: Informs network a peer has chunk with given hash.
- PeerAdd / PeerRemove: Introduce or revoke peers; include signatures to prevent spoofing.
Validation steps:
- Decode
pubkey
,sig
. - Reconstruct signing context (
type
+ rawpayload
). - Verify signature (Ed25519).
- Accept or reject based on ACL (for sensitive types).
- Specify snapshot ID to restore.
- Snapshot is loaded and its signature verified.
- For each chunk hash in the snapshot:
- If present locally, use it.
- Otherwise, consult known block announcements and attempt fetching from peers via direct stream protocol.
- Decrypt each chunk and reconstruct files.
- Restore filesystem metadata (mode, timestamps).
Example:
./bin/restore-agent restore snapshot-abc123 restored/ -c config.yaml -p "yourpass"
Unit tests are included for critical modules:
internal/crypto/crypto_test.go
β encryption/decryption and hashing.internal/chunker/chunker_test.go
β chunk boundary correctness and edge cases.internal/identity/identity_test.go
β persistent identity creation and validation.
Run:
make test
Or directly:
go test ./... -v
docker build -t shadowvault:latest .
docker run --rm -v "$(pwd)/data":/data -v "$(pwd)/config.yaml":/app/config.yaml:ro -e PASSPHRASE=yourpass shadowvault:latest daemon -c /app/config.yaml -p "$PASSPHRASE"
docker compose up
(This uses docker-compose.yml
to spin up node1
and node2
, share bootstrap configuration and run daemons.)
docker exec -it backupagent_node1 /bin/sh -c "./bin/backup-agent snapshot /data/to/backup -c /app/config.yaml -p yourpass"
docker exec -it backupagent_node1 /bin/sh -c "./bin/restore-agent restore <snapshot-id> /restored -c /app/config.yaml -p yourpass"
Mount host directories into container to persist:
- Snapshot data and identity under
repository_path
(e.g.,./data/node1
) - Config via bind mount.
tools/hashfile.c
is a small companion compiled with OpenSSL that computes SHA-256 of arbitrary files (helpful for independent verification):
Compile:
gcc -o tools/hashfile tools/hashfile.c -lcrypto
Usage:
./tools/hashfile /path/to/snapshot.json
Outputs hex digest + filename.
scripts/bootstrap.sh
: Initializes default config and identity by briefly spinning up the agent.scripts/snapshot.sh
: Wrapper to snapshot a path.scripts/restore.sh
: Wrapper to restore a snapshot.entrypoint.sh
: Root orchestrator that builds binaries, ensures config, launches daemon, and optionally takes a first snapshot.
Make executable:
chmod +x entrypoint.sh scripts/*.sh
.
βββ config.yaml
βββ entrypoint.sh
βββ bin/
β βββ backup-agent
β βββ restore-agent
β βββ peerctl
βββ data/ # repository_path
β βββ identity.key # persistent libp2p key
β βββ metadata.db # bbolt DB (snapshots, peers, blocks)
β βββ snapshots/ # encrypted snapshot metadata
βββ snapshots/ # (optional local snapshot working trees)
βββ .shadowvault/ # if alternate layout used
βββ scripts/
β βββ bootstrap.sh
β βββ snapshot.sh
β βββ restore.sh
βββ tools/
β βββ hashfile # compiled C helper
βββ README.md # this document
Problem | Likely Cause | Remedy |
---|---|---|
Snapshot fails with read errors | Permissions or missing files | Check file access, run with sufficient privileges |
Cannot fetch chunk from peer | Peer offline / no announcement | Ensure peer is connected, check gossip logs, add via peerctl |
Signature validation fails | Passphrase mismatch / tampered snapshot | Verify passphrase; reject snapshot if integrity compromised |
Identity changes unexpectedly | Identity key deleted or corrupted | Restore identity.key backup; avoid deleting it |
Peer not discovered | DHT/bootstrap misconfig | Ensure bootstrap addresses are correct and reachable |
Cache inconsistency on restore | Corrupted local chunk | Delete affected chunk and allow re-fetch from another peer |
ShadowVault defines its on-wire and on-disk message formats in Protobuf, organized under proto/
:
proto/
common.proto
snapshot.proto
block.proto
peer.proto
auth.proto
identity.proto
service.proto
Install the Protobuf plugins for Go:
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
Then from the project root run:
protoc --go_out=. --go-grpc_out=. proto/*.proto
This will generate Go packages under github.com/yourusername/shadowvault/proto/...
.
-
common.proto
Ack
β simple acknowledgment wrapper (ok
+message
).
-
snapshot.proto
FileEntry
β path, metadata and list of chunk hashes.SnapshotMetadata
β signed snapshot descriptor (ID, parent, timestamp, files, signer, signature).SnapshotAnnouncement
β wraps the above for gossip/pubsub.
-
block.proto
BlockAnnounce
β tell peers βI have chunk<hash>
β.BlockRequest
β signed request for a chunk.BlockResponse
β signed response carrying the encrypted payload.
-
peer.proto
PeerInfo
βpeer_id
+ multiaddrs.PeerAdd
/PeerRemove
β signed introductions or removals.PeerList
β enumeration of known peers.
-
auth.proto
ACL
β list of admin public keys.SignedMessage
β generic wrapper (payload + signature + pubkey).
-
identity.proto
Identity
β peer identity record (peer_id
+pubkey_base64
).
-
service.proto
ShadowVault
gRPC service β RPCs for snapshot announce, block request/response, peer add/remove, list peers.
- Local passphrase: The encryption key is derived from the passphrase; use high-entropy passphrases and protect them.
- Identity key: Stored unencrypted by default; restrict filesystem permissions (0600). Optionally extend to wrap with passphrase.
- Snapshot authenticity: Signing prevents snapshot tampering; always verify signature on restore.
- Peer trust: Gossip and block availability are unauthenticated unless guarded via ACL. Malicious peers could advertise bogus availabilityβintegrity fails during fetch if data doesn't decrypt or hash mismatch occurs.
- Replay / rollback: Snapshot history is linear but not globally ordered; you may layer version pinning if needed.
- Denial of Service: A flood of bogus block requests could be mitigated by rate-limiting or proof-of-work in extensions.
- Advanced chunker: Replace simple content-defined boundary logic with full Rabin fingerprinting.
- Remote CAS: Overlay S3, IPFS, or other backends for wider distribution.
- Snapshot diffing: Visualize differences between snapshots to show added/removed chunks.
- Gossip compression: Batch announcements or use bloom filters to reduce chatter.
- Access control: Fine-grained capabilities per snapshot or time-limited tokens.
- GUI/dashboard: Visualize peers, snapshots, and integrity status.
- Metric exports: Prometheus / telemetry integration for health and sync stats.
- Fork the repository.
- Create a feature branch (e.g.,
feature/remote-cas
). - Add or update tests demonstrating the new behavior.
- Submit a pull request with a clear description and rationale.
Areas of high impact:
- Parallelized restore/snapshot execution.
- Peer reputation and gossip sanitization.
- Encrypted, versioned identity/key rotation.
- Plugin system for new transport/snapshot backends.
- Chunk: A small piece of a file, determined via content-defined chunking.
- CAS: Content-addressed storage; stores data by hash to deduplicate.
- Snapshot: A signed descriptor capturing a point-in-time view of directory contents via chunk hashes.
- Peer: Another instance of ShadowVault participating in sync.
- PubSub: Gossip mechanism for announcing availability.
- ACL: Access control list governing trusted signers/admins.
- Identity Key: Libp2p private key used for peer identity and signing.
MIT License. See LICENSE
for full terms.
Thank you for checking out ShadowVault! We hope it helps you securely back up and manage your data in a decentralized way. For any questions or contributions, please refer to the Contributing section.