Files
arbiter/ARCHITECTURE.md

333 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Arbiter
Arbiter is a permissioned signing service for cryptocurrency wallets. It runs as a background service on the user's machine with an optional client application for vault management.
**Core principle:** The vault NEVER exposes key material. It only produces signatures when a request satisfies the configured policies.
---
## 1. Peer Types
Arbiter distinguishes two kinds of peers:
- **User Agent** — A client application used by the owner to manage the vault (create wallets, approve SDK clients, configure policies).
- **SDK Client** — A consumer of signing capabilities, typically an automation tool. In the future, this could include a browser-based wallet.
- **Recovery Operator** — A dormant recovery participant with narrowly scoped authority used only for custody recovery and operator replacement.
---
## 2. Authentication
### 2.1 Challenge-Response
All peers authenticate via public-key cryptography using a challenge-response protocol:
1. The peer sends its public key and requests a challenge.
2. The server looks up the key in its database. If found, it increments the nonce and returns a challenge (replay-attack protection).
3. The peer signs the challenge with its private key and sends the signature back.
4. The server verifies the signature:
- **Pass:** The connection is considered authenticated.
- **Fail:** The server closes the connection.
### 2.2 User Agent Bootstrap
On first run — when no User Agents are registered — the server generates a one-time bootstrap token. It is made available in two ways:
- **Local setup:** Written to `~/.arbiter/bootstrap_token` for automatic discovery by a co-located User Agent.
- **Remote setup:** Printed to the server's console output.
The first User Agent must present this token alongside the standard challenge-response to complete registration.
### 2.3 SDK Client Registration
There is no bootstrap mechanism for SDK clients. They must be explicitly approved by an already-registered User Agent.
---
## 3. Multi-Operator Governance
When more than one User Agent is registered, the vault is treated as having multiple operators. In that mode, sensitive actions are governed by voting rather than by a single operator decision.
### 3.1 Voting Rules
Voting is based on the total number of registered operators:
- **1 operator:** no vote is needed; the single operator decides directly.
- **2 operators:** full consensus is required; both operators must approve.
- **3 or more operators:** quorum is `floor(N / 2) + 1`.
For a decision to count, the operator's approval or rejection must be signed by that operator's associated key. Unsigned votes, or votes that fail signature verification, are ignored.
Examples:
- **3 operators:** 2 approvals required
- **4 operators:** 3 approvals required
### 3.2 Actions Requiring a Vote
In multi-operator mode, a successful vote is required for:
- approving new SDK clients
- granting an SDK client visibility to a wallet
- approving a one-off transaction
- approving creation of a persistent grant
- approving operator replacement
- approving server updates
- updating Shamir secret-sharing parameters
### 3.3 Special Rule for Key Rotation
Key rotation always requires full quorum, regardless of the normal voting threshold.
This is stricter than ordinary governance actions because rotating the root key requires every operator to participate in coordinated share refresh/update steps. The root key itself is not redistributed directly, but each operator's share material must be changed consistently.
### 3.4 Root Key Custody
When the vault has multiple operators, the vault root key is protected using Shamir secret sharing.
The vault root key is encrypted in a way that requires reconstruction from user-held shares rather than from a single shared password.
For ordinary operators, the Shamir threshold matches the ordinary governance quorum. For example:
- **2 operators:** `2-of-2`
- **3 operators:** `2-of-3`
- **4 operators:** `3-of-4`
In practice, the Shamir share set also includes Recovery Operator shares. This means the effective Shamir parameters are computed over the combined share pool while keeping the same threshold. For example:
- **3 ordinary operators + 2 recovery shares:** `2-of-5`
This ensures that the normal custody threshold follows the ordinary operator quorum, while still allowing dormant recovery shares to exist for break-glass recovery flows.
### 3.5 Recovery Operators
Recovery Operators are a separate peer type from ordinary vault operators.
Their role is intentionally narrow. They can only:
- participate in unsealing the vault
- vote for operator replacement
Recovery Operators do not participate in routine governance such as approving SDK clients, granting wallet visibility, approving transactions, creating grants, approving server updates, or changing Shamir parameters.
### 3.6 Sleeping and Waking Recovery Operators
By default, Recovery Operators are **sleeping** and do not participate in any active flow.
Any ordinary operator may request that Recovery Operators **wake up**.
Any ordinary operator may also cancel a pending wake-up request.
This creates a dispute window before recovery powers become active. The default wake-up delay is **14 days**.
Recovery Operators are therefore part of the break-glass recovery path rather than the normal operating quorum.
The high-level recovery flow is:
```mermaid
sequenceDiagram
autonumber
actor Op as Ordinary Operator
participant Server
actor Other as Other Operator
actor Rec as Recovery Operator
Op->>Server: Request recovery wake-up
Server-->>Op: Wake-up pending
Note over Server: Default dispute window: 14 days
alt Wake-up cancelled during dispute window
Other->>Server: Cancel wake-up
Server-->>Op: Recovery cancelled
Server-->>Rec: Stay sleeping
else No cancellation for 14 days
Server-->>Rec: Wake up
Rec->>Server: Join recovery flow
critical Recovery authority
Rec->>Server: Participate in unseal
Rec->>Server: Vote on operator replacement
end
Server-->>Op: Recovery mode active
end
```
### 3.7 Committee Formation
There are two ways to form a multi-operator committee:
- convert an existing single-operator vault by adding new operators
- bootstrap an unbootstrapped vault directly into multi-operator mode
In both cases, committee formation is a coordinated process. Arbiter does not allow multi-operator custody to emerge implicitly from unrelated registrations.
### 3.8 Bootstrapping an Unbootstrapped Vault into Multi-Operator Mode
When an unbootstrapped vault is initialized as a multi-operator vault, the setup proceeds as follows:
1. An operator connects to the unbootstrapped vault using a User Agent and the bootstrap token.
2. During bootstrap setup, that operator declares:
- the total number of ordinary operators
- the total number of Recovery Operators
3. The vault enters **multi-bootstrap mode**.
4. While in multi-bootstrap mode:
- every ordinary operator must connect with a User Agent using the bootstrap token
- every Recovery Operator must also connect using the bootstrap token
- each participant is registered individually
- each participant's share is created and protected with that participant's credentials
5. The vault is considered fully bootstrapped only after all declared operator and recovery-share registrations have completed successfully.
This means the operator and recovery set is fixed at bootstrap completion time, based on the counts declared when multi-bootstrap mode was entered.
### 3.9 Special Bootstrap Constraint for Two-Operator Vaults
If a vault is declared with exactly **2 ordinary operators**, Arbiter requires at least **1 Recovery Operator** to be configured during bootstrap.
This prevents the worst-case custody failure in which a `2-of-2` operator set becomes permanently unrecoverable after loss of a single operator.
---
## 4. Server Identity
The server proves its identity using TLS with a self-signed certificate. The TLS private key is generated on first run and is long-term; no rotation mechanism exists yet due to the complexity of multi-peer coordination.
Peers verify the server by its **public key fingerprint**:
- **User Agent (local):** Receives the fingerprint automatically through the bootstrap token.
- **User Agent (remote) / SDK Client:** Must receive the fingerprint out-of-band.
> A streamlined setup mechanism using a single connection string is planned but not yet implemented.
---
## 5. Key Management
### 5.1 Key Hierarchy
There are three layers of keys:
| Key | Encrypts | Encrypted by |
|---|---|---|
| **User key** (password) | Root key | — (derived from user input) |
| **Root key** | Wallet keys | User key |
| **Wallet keys** | — (used for signing) | Root key |
This layered design enables:
- **Password rotation** without re-encrypting every wallet key (only the root key is re-encrypted).
- **Root key rotation** without requiring the user to change their password.
### 5.2 Encryption at Rest
The database stores everything in encrypted form using symmetric AEAD. The encryption scheme is versioned to support transparent migration — when the vault unseals, Arbiter automatically re-encrypts any entries that are behind the current scheme version. See [IMPLEMENTATION.md](IMPLEMENTATION.md) for the specific scheme and versioning mechanism.
---
## 6. Vault Lifecycle
### 6.1 Sealed State
On boot, the root key is encrypted and the server cannot perform any signing operations. This state is called **Sealed**.
### 6.2 Unseal Flow
To transition to the **Unsealed** state, a User Agent must provide the password:
1. The User Agent initiates an unseal request.
2. The server generates a one-time key pair and returns the public key.
3. The User Agent encrypts the user's password with this one-time public key and sends the ciphertext to the server.
4. The server decrypts and verifies the password:
- **Success:** The root key is decrypted and placed into a hardened memory cell. The server transitions to `Unsealed`. Any entries pending encryption scheme migration are re-encrypted.
- **Failure:** The server returns an error indicating the password is incorrect.
### 6.3 Memory Protection
Once unsealed, the root key must be protected in memory against:
- Memory dumps
- Page swaps to disk
- Hibernation files
See [IMPLEMENTATION.md](IMPLEMENTATION.md) for the current and planned memory protection approaches.
---
## 7. Permission Engine
### 7.1 Fundamental Rules
- SDK clients have **no access by default**.
- Access is granted **explicitly** by a User Agent.
- Grants are scoped to **specific wallets** and governed by **policies**.
Each blockchain requires its own policy system due to differences in static transaction analysis. Currently, only EVM is supported; Solana support is planned.
Arbiter is also responsible for ensuring that **transaction nonces are never reused**.
### 7.2 EVM Policies
Every EVM grant is scoped to a specific **wallet** and **chain ID**.
#### 7.2.0 Transaction Signing Sequence
The high-level interaction order is:
```mermaid
sequenceDiagram
autonumber
actor SDK as SDK Client
participant Server
participant UA as User Agent
SDK->>Server: SignTransactionRequest
Server->>Server: Resolve wallet and wallet visibility
alt Visibility approval required
Server->>UA: Ask for wallet visibility approval
UA-->>Server: Vote result
end
Server->>Server: Evaluate transaction
Server->>Server: Load grant and limits context
alt Grant approval required
Server->>UA: Ask for execution / grant approval
UA-->>Server: Vote result
opt Create persistent grant
Server->>Server: Create and store grant
end
Server->>Server: Retry evaluation
end
critical Final authorization path
Server->>Server: Check limits and record execution
Server-->>Server: Signature or evaluation error
end
Server-->>SDK: Signature or error
```
#### 7.2.1 Transaction Sub-Grants
Arbiter maintains an ever-expanding database of known contracts and their ABIs. Based on contract knowledge, transaction requests fall into three categories:
**1. Known contract (ABI available)**
The transaction can be decoded and presented with semantic meaning. For example: *"Client X wants to transfer Y USDT to address Z."*
Available restrictions:
- Volume limits (e.g., "no more than 10,000 tokens ever")
- Rate limits (e.g., "no more than 100 tokens per hour")
**2. Unknown contract (no ABI)**
The transaction cannot be decoded, so its effects are opaque — it could do anything, including draining all tokens. The user is warned, and if approved, access is granted to all interactions with the contract (matched by the `to` field).
Available restrictions:
- Transaction count limits (e.g., "no more than 100 transactions ever")
- Rate limits (e.g., "no more than 5 transactions per hour")
**3. Plain ether transfer (no calldata)**
These transactions have no `calldata` and therefore cannot interact with contracts. They can be subject to the same volume and rate restrictions as above.
#### 7.2.2 Global Limits
In addition to sub-grant-specific restrictions, the following limits can be applied across all grant types:
- **Gas limit** — Maximum gas per transaction.
- **Time-window restrictions** — e.g., signing allowed only 08:0020:00 on Mondays and Thursdays.