Encryption, in plain language
If you've read How it works, you've seen us use words like "XChaCha20-Poly1305," "X25519," and "HKDF" without much explanation. This post is the prerequisite. It walks through what encryption is, the ideas it's built on, and how Vita uses each one — without assuming any background. By the end, you should be able to read the technical post and have it make sense.
What "encryption" actually is
Encryption is what happens when you scramble data so badly that the only way to unscramble it is with a specific number called a key.
The number is huge — typically 256 random bits. To put that in perspective, there are about 2²⁶² atoms in the observable universe. A 256-bit key is one of about 2²⁵⁶ possibilities. You can't guess it, and you can't unscramble without it. The math is solid. The hard part of real-world encryption is everything around the math: where keys live, who has them, how they get to where they need to be.
A useful mental model: encryption is a vault. The key is what opens it. Strong cryptography is "the vault is unbreakable, so the only attack is to steal the key." Most security failures in real systems are key failures, not crypto failures.
The two main flavors
There are two ways to do encryption, and Vita uses both.
Symmetric encryption
One key locks the data, the same key unlocks it. Like a deadbolt: whoever has the key can both lock the door and open it. Fast, simple, and ideal when everyone who needs access already shares the key.
Vita's workhorse here is XChaCha20-Poly1305, a modern symmetric cipher. The "XChaCha20" part scrambles your data; we'll get to the "Poly1305" part in a second. Whenever Vita writes one of your habits to disk or sends one over the network, it's been through XChaCha20.
Asymmetric encryption
Two keys this time, mathematically tied together: a public key you can hand out to anyone, and a private key you keep to yourself. What's locked with one can only be unlocked with the other.
This sounds like a strange property, but it's exactly what you need to talk to a stranger. If someone publishes their public key, you can lock a message with it that only they can read — without ever having met them in person to swap a shared key.
Asymmetric is slower than symmetric, so in practice systems use it for small jobs (proving identity, agreeing on a key) and then switch to symmetric for the bulk of the data. Vita is no different.
Two more ideas you need
Authenticated encryption (the "tamper-evident seal")
Plain symmetric encryption keeps secrets, but it doesn't notice if someone changes the ciphertext. A good attacker doesn't need to read your data — sometimes flipping a bit is enough. So modern ciphers add a tag: a tiny fingerprint of the message and the key, computed during encryption. If anything in the sealed record is altered — even a single bit — the tag stops matching, and decryption fails noisily instead of producing wrong-looking plaintext.
This is called AEAD (Authenticated Encryption with Associated Data). The "Poly1305" part of XChaCha20-Poly1305 is the tag.
The practical consequence: a server hosting Vita's encrypted data can delete your records, or delay them, but it cannot modify one without your device detecting the corruption.
Key derivation: one master key, many specialized keys
In the real world you usually have one root secret but need different keys for different purposes (encrypting today's data, encrypting a password, identifying a stream, etc.). The clean way to do this is a key derivation function.
The one Vita uses is HKDF-SHA256. You give it (root_key, label)
and it gives you back a fresh key that is mathematically independent of
every key derived with a different label. Leak the derived key for one
purpose, the others are still safe.
This matters because Vita doesn't use the master key directly for everything — it derives sub-keys for sub-tasks. We'll see one in a minute.
The neat trick: agreeing on a secret in public
Here's the puzzle. Two devices have never met. They want to share a secret. Their only way to talk is through a server, which is also listening. How do they end up with a shared key the server doesn't have?
The answer is Diffie-Hellman key agreement, and it's one of the prettiest ideas in cryptography. Standard analogy:
Alice and Bob agree publicly on a base color — say yellow. Each picks a private color in their head and tells nobody.
Alice mixes her private color into yellow and shouts the result across the room. Bob does the same.
Eve, the eavesdropper, hears both shouted mixes and the original yellow. She cannot un-mix paint, so she can't recover either private color.
Now Alice mixes her private color into Bob's shouted mix. Bob mixes his private color into Alice's shouted mix. By the magic of commutativity (mixing is order-independent), they both arrive at the same final shade — yet Eve, with only the two shouted mixes and yellow, cannot.
Replace "color" with "big number raised to a private exponent modulo a prime," and "un-mixing" with "the discrete logarithm problem," and you have Diffie-Hellman. The computational problem is intractable — it would take longer than the age of the universe to break with current techniques.
The version Vita uses is X25519, a specific elliptic-curve flavor of Diffie-Hellman that is fast, has no known weaknesses, and is the same primitive Signal, WireGuard, and TLS 1.3 use.
Where do keys live?
The math is one thing; the implementation is another. A 256-bit key is ultimately bytes in some computer's memory or storage. Where you put those bytes matters.
Vita uses two browser features to keep keys out of harm's way:
WebCrypto's "non-extractable" flag. When the browser generates a key, you can mark it non-extractable. Browsers honor this strictly: you can ask WebCrypto to use the key (encrypt this, sign this) but you cannot ask for the key bytes back, ever. Even Vita's own JavaScript can't pull them out. They live behind the browser's API surface, which is closer to the OS than to your tab.
IndexedDB. A storage area scoped to the website's origin. Other sites — even in the same browser — can't see what Vita has stored. The browser flushes it when the user clears site data, but it survives reloads, restarts, and weeks of inactivity.
So: Vita asks WebCrypto for a non-extractable wrapping key, stores it in IndexedDB, then uses it to wrap (encrypt) the master key. The wrapped master key also goes in IndexedDB. To use the master key, Vita asks WebCrypto to unwrap it for the duration of one operation. The raw bytes are never visible to JavaScript.
How Vita stitches it all together
Now we can read the actual flow.
1. First launch. Vita asks WebCrypto to generate K_master, a
256-bit symmetric key. It generates the wrapping key, wraps K_master,
puts both in IndexedDB. Done. No network call, no signup, no password.
2. Every change you make. The WASM core mutates an Automerge
document. The change-bytes get sealed by XChaCha20-Poly1305 under
K_master. The sealed record (a 24-byte nonce + the ciphertext + a
16-byte tag) is what gets:
- appended to a file in OPFS, your encrypted local log
- (optionally, if sync is on) sent over the WebSocket to the relay
The relay sees an opaque blob. It can route the blob to your other devices. It can't read the blob.
3. Stream identifiers. When the relay fans events out, it groups
them by what we call a stream_id. Vita computes:
K_stream = HKDF(K_master, info="stream-v1")
stream_id = HMAC-SHA256(K_stream, "observations")
// or "templates", "categories", etc.
stream_id is a 32-byte value that looks uniformly random to anyone
without K_master. Two of your devices compute the same
stream_id from the same K_master, so they line up correctly.
Anyone else can't tell which stream_id corresponds to which
document, or whether two streams belong to the same kind of data.
4. Pairing a new device. This is where Diffie-Hellman shows up.
The new device must end up with K_master, but K_master must never
appear in the clear on the network.
- Device A (already paired) generates a fresh ephemeral X25519 keypair. The public half goes into a QR code along with a short pairing code. The private half stays on A.
- Device B (joining) scans the QR, generates its own ephemeral X25519 keypair, and runs Diffie-Hellman against A's public key.
- Both devices independently derive the same transport key from the DH shared secret via HKDF.
- Both devices also derive a 6-digit short authentication string (SAS) from the same shared secret — same DH input, different HKDF label, so the SAS is independent of the transport key. Each device shows its SAS on screen.
- The user reads the digits on both screens. Device A only releases
K_masterafter the user confirms on A's screen that the codes match. If they don't match, the user clicks "Don't match" and the pairing is aborted. - Once the user confirms, A seals
K_masterunder the transport key with XChaCha20-Poly1305 and sends the sealed payload through the relay. - B unseals it, stores it (wrapped, like A did originally), and registers itself with the relay as a new device.
The relay is in the loop the whole time but only ever sees ciphertext.
The transport key it would need to read the sealed K_master was
computed from two ephemeral private keys that never left their
respective devices. The SAS step exists so the user — not the relay —
is the final authority on whether the two devices actually saw each
other's keys.
This is the same pattern as Signal's X3DH, with the same short-authentication-string verification Signal uses for "safety numbers."
5. Recovery phrase (optional). When the user enables it, the device generates a 12-word phrase using BIP-39 — the same wordlist Bitcoin wallets use. From the phrase, two values get derived via HKDF (different labels, so they're independent):
- a 32-byte
recovery_id, used by the relay as a public lookup key - a 32-byte wrapping key
The device seals K_master together with account_id under the
wrapping key, then uploads the resulting 88-byte blob to the relay
keyed by recovery_id. The relay sees opaque bytes and a
random-looking 32-byte id. The phrase never leaves the user's head
and the wrapping key never leaves the device.
To restore on a fresh install, the user types the 12 words. The new
device re-derives recovery_id and the wrapping key, fetches the
blob from the relay, decrypts to recover K_master and account_id,
and registers itself as a new device. From there it pulls the
encrypted event log and is back in business.
The phrase is the only thing that can do this; lose it and the account on the relay is permanent ciphertext. That's the same trade-off any end-to-end-encrypted system makes.
What about man-in-the-middle attacks?
This is the question every honest cryptography post has to answer. The short version: the relay can't get in unless the user lets it.
Steady-state sync. Every record is sealed with AEAD under
K_master. A malicious relay can drop your events, but it cannot
forge or modify them — the tag check on the receiving device would
fail. So once two devices share K_master, the relay is essentially
a dumb pipe. Even if it's hostile.
TLS between your browser and the relay. Standard browser CA model. As secure as any HTTPS site you visit, modulo the usual caveats about mis-issued certificates and hostile CAs. The relay must be served over HTTPS in production.
Pairing via QR code: safe. The QR code carries A's ephemeral public key directly from A's screen to B's camera. That's an authenticated channel: the relay isn't in it. A malicious relay can't substitute a different public key without B's camera lying. This path is MITM-resistant on its own.
Pairing via typed short-code: protected by SAS. If you don't have a camera and you type the short code on B, B has to fetch A's ephemeral public key from the relay. A malicious relay could in principle swap that for its own public key and try to run Diffie-Hellman with each device independently. The defense is the short authentication string described in the pairing flow: both devices derive a 6-digit code from their respective shared secrets and show it on screen. If the relay is in the middle, the two shared secrets diverge and the codes don't match. The user reads the numbers on both screens before clicking "match," and that's the trip-wire — the relay can't fabricate a 6-digit code that lines up with the legitimate side without breaking Diffie-Hellman, which it can't.
The honest caveat: SAS only works if the user actually looks at both screens. A user who reflexively clicks "match" without reading is giving the relay back the keys to the kingdom. We label the buttons "Codes match — pair" and "Don't match" specifically to make the act of confirming feel like a check, not a continue. But we can't take that last hop for the user.
Recovery phrase: protected by the math, vulnerable to the user. The blob on the relay is sealed under a key derived from the user's 12-word phrase. With 128 bits of entropy stretched through PBKDF2 to a 64-byte seed, then HKDF, the relay can't brute-force its way in — the keyspace is far too large. But the phrase itself is now the single point of failure: anyone with those 12 words can recover the account from any working relay. The system can't tell the difference between you and someone reading your sticky note.
What the relay can and can't see
A summary, since this is what people usually want to know:
| Visible to the relay | Hidden from the relay |
|---|---|
| your account ID | habit names |
| your device IDs | category names |
| your device labels (e.g. "Pixel 8") | observations and resolutions |
| sealed event sizes and timestamps | your name, timezone, locale |
| stream IDs (random-looking 32-byte values) | the structure of your Automerge changes |
| recovery_id (32 random-looking bytes) | the recovery phrase, K_master, account_id inside the recovery blob |
| TLS metadata (your IP) | everything inside the sealed records |
If the relay were compromised tomorrow, an attacker would learn that your account exists, that you have devices, when you opened the app, and how much you typed. They would not learn what you typed, what habits you track, or anything else inside Vita.
Why all this complication?
Plain old web apps are simpler: you log in with a password, the server holds your data, end of story. You trust the server with everything.
The trade Vita makes is more code on your device for less trust in any company, including ours. The relay is replaceable. The encryption is verifiable from the source. The only place your data exists in the clear is on devices you control. That property — "the company can vanish and you still have your habits" — is the actual point. The cryptography is just the machinery that makes it true.
Further reading
- How it works — the deeper technical companion, with the same ground covered more densely
- Local-first software — Ink & Switch's foundational paper
- Cryptographic Right Answers — Thomas Ptáček on which primitives to actually use
- The X25519 paper — Daniel J. Bernstein on the curve we use
- The Noise Protocol Framework — the family of patterns Vita's pairing handshake belongs to