Bug: MemSafe::new panics under concurrent load due to VirtualLock quota exhaustion (Windows) #28
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Description
MemSafe::new(...)on Windows callsVirtualLock, which pins memory pages in RAM to preventsensitive data from being swapped to disk. Each process has a Working Set Quota — an OS-enforced
limit on the total amount of locked memory.
When multiple concurrent tasks call
MemSafe::new(...).unwrap()simultaneously, they cancollectively exhaust this quota. Once exhausted,
VirtualLockreturns an error,MemSafe::newreturns
Err, and.unwrap()panics.How it was discovered
RUST_TEST_THREADS = "1"in.cargo/config.tomlmasks the problem: running tests sequentiallycreates no concurrent pressure on the quota.
cargo nextestalso does not reproduce it, because itisolates each test in a separate process with its own quota.
In production the server is a single process — all concurrent requests compete for the same quota.
Failure points
All
.unwrap()calls onMemSafe::newin paths that can be concurrent:crates/arbiter-server/src/actors/keyholder/mod.rs:228try_unsealcrates/arbiter-server/src/actors/keyholder/mod.rs:282decryptcrates/arbiter-server/src/actors/keyholder/encryption/v1.rs:64KeyCell::try_fromcrates/arbiter-server/src/actors/keyholder/encryption/v1.rs:76KeyCell::new_secure_randomcrates/arbiter-server/src/actors/keyholder/encryption/v1.rs:152derive_seal_keycrates/arbiter-server/src/evm/safe_signer.rs:47generatecrates/arbiter-server/src/evm/safe_signer.rs:78SafeSigner::newcrates/arbiter-server/src/actors/evm/mod.rs:104crates/arbiter-server/src/actors/user_agent/session.rs:236Production failure scenario
N concurrent EVM sign-transaction requests arrive. Each goes through:
keyholder.decrypt()→MemSafe::new(ciphertext).unwrap()+MemSafe::new(Key)insideKeyCellevm::sign_transaction→MemSafe::new(key_bytes).unwrap()+SafeSigner::new→ anotherMemSafeAt N ≈ 20–50 (depending on data size and system quota), the quota is exhausted. The next
.unwrap()panics — the tokio task crashes, or if the panic propagates past the actor framework boundary, the
process terminates.
Fix options
.unwrap()with?/map_err, returnError::MemSafeAllocation— signal the client with an error instead of panickingSetProcessWorkingSetSizewith headroom during serverinitialization (workaround, does not scale)
MemSafebuffers of fixed size, reuse acrossrequests
Semaphoreso the numberof live
MemSafeallocations at any given moment stays within a safe bound