Privasys
Enclave OS

RPC & Circular Buffers

The lock-free SPSC queue design and RPC protocol that connect the host and enclave.

Enclave OS uses a custom RPC protocol over shared-memory circular buffers to communicate between the host (untrusted) and the enclave (trusted). This design minimizes the number of expensive ECALL/OCALL transitions while maintaining high throughput.

Why Not Standard ECALLs/OCALLs?

The SGX SDK provides ECALLs (host → enclave) and OCALLs (enclave → host) as the standard way to cross the enclave boundary. Each call involves:

  • A context switch costing ~10,000 CPU cycles.
  • Parameter marshalling — serializing and copying data across the boundary.
  • EEXIT/EENTER instructions that flush the TLB and pipeline.

For a web server handling many requests per second, calling ECALLs/OCALLs for every read/write operation would be prohibitively slow.

Instead, Enclave OS makes one ECALL (ecall_run) to start the enclave's event loop, then all data flows through shared memory — memory regions that both the host and enclave can access. The only OCALL (ocall_notify) is a lightweight signal with no data payload.

SPSC Circular Buffers

The shared memory is organized as Single-Producer Single-Consumer (SPSC) circular buffers — a lock-free data structure where one thread writes and one thread reads, without any mutex or atomic compare-and-swap.

Memory Layout

┌──────────────────────────────────────────────────┐
│                   2 MB Buffer                    │
├──────────┬───────────────────────────┬───────────┤
│ Header   │         Data Region       │  Padding  │
│ (64B)    │                           │  (align)  │
└──────────┴───────────────────────────┴───────────┘

Header (64 bytes, cache-line aligned):
┌──────────┬──────────┬──────────────────────────┐
│ head: u64│ tail: u64│ padding (48 bytes)       │
│ (atomic) │ (atomic) │ (to fill cache line)     │
└──────────┴──────────┴──────────────────────────┘

Each connection gets two buffers: one for requests (host → enclave) and one for responses (enclave → host).

Key Design Decisions

DecisionRationale
2 MB capacityLarge enough to buffer complete HTTP requests/responses without blocking, small enough to fit many connections in EPC memory.
64-byte cache-line alignmentThe head and tail pointers are on separate cache lines, preventing false sharing between the producer and consumer cores.
Atomic Acquire/Release orderingThe weakest memory ordering that guarantees the consumer sees all writes made before the producer advanced the tail. No SeqCst overhead.
4 MB max message sizeMessages larger than the buffer wrap around. A 4 MB upper bound prevents denial-of-service from oversized messages.

Message Framing

Each message in the buffer is prefixed by a 4-byte header containing the message length (little-endian u32):

┌───────────┬──────────────────────┐
│ len: u32  │  payload (len bytes) │
│ (4 bytes) │                      │
└───────────┴──────────────────────┘

The constant MSG_HEADER_SIZE = 4 is defined in the shared common crate and used by both the host and enclave.

Write Path (Producer)

fn send(&self, data: &[u8]) -> Result<()> {
    let total = MSG_HEADER_SIZE + data.len();
    // 1. Read head and tail (atomic Acquire)
    // 2. Check available space (wrapping arithmetic)
    // 3. Write length header
    // 4. Write payload (may wrap around buffer end)
    // 5. Advance tail (atomic Release)
}

Read Path (Consumer)

fn recv(&self) -> Result<Vec<u8>> {
    // 1. Read head and tail (atomic Acquire)
    // 2. Check if data available
    // 3. Read length header
    // 4. Read payload (may wrap around buffer end)
    // 5. Advance head (atomic Release)
}

The wrapping logic handles the case where a message spans the end of the circular buffer — the write/read wraps around to the beginning.

RPC Protocol

On top of the raw circular buffers, Enclave OS implements a request-response RPC protocol. This provides multiplexing, method dispatch, and error handling.

Request Header (14 bytes)

┌──────────────┬───────────────┬─────────────────┐
│ req_id: u64  │ method: u16   │ payload_len: u32│
│ (8 bytes)    │ (2 bytes)     │ (4 bytes)       │
└──────────────┴───────────────┴─────────────────┘

Response Header (16 bytes)

┌──────────────┬───────────────┬─────────────────┐
│ req_id: u64  │ status: i32   │ payload_len: u32│
│ (8 bytes)    │ (4 bytes)     │ (4 bytes)       │
└──────────────┴───────────────┴─────────────────┘

The req_id field allows the enclave to match responses to requests, enabling potential future pipelining.

RPC Methods

The protocol defines 13 methods across four categories. Method IDs use hex ranges to group related operations:

Network Operations

MethodIDDirectionPurpose
NetTcpListen0x0100E→HStart listening on a TCP port
NetTcpAccept0x0101E→HAccept a new TCP connection
NetTcpConnect0x0102E→HOpen an outbound TCP connection
NetSend0x0103E→HSend bytes on a TCP socket
NetRecv0x0104E→HReceive bytes from a TCP socket
NetClose0x0105E→HClose a TCP connection

Key-Value Store

MethodIDDirectionPurpose
KvPut0x0200E→HStore a key-value pair
KvGet0x0201E→HRetrieve a value by key
KvDelete0x0202E→HDelete a key
KvListKeys0x0203E→HList keys matching a prefix

Utility

MethodIDDirectionPurpose
GetCurrentTime0x0300E→HGet current wall-clock time
Log0x0301E→HSend a log message to the host

Lifecycle

MethodIDDirectionPurpose
Shutdown0xFF00E→HShut down the enclave gracefully

Serialization

Request and response payloads are serialized using a simple length-prefixed binary format (not protobuf, not JSON). This keeps the serialization code minimal and avoids pulling in large dependencies inside the enclave.

For example, a KvPut request payload:

┌──────────────┬───────────┬──────────────┬─────────────┐
│ key_len: u32 │ key bytes │ val_len: u32 │ value bytes │
└──────────────┴───────────┴──────────────┴─────────────┘

Performance Characteristics

MetricValue
ECALL/OCALL transitions per request0 (after initial ecall_run)
Memory copies per message2 (write into buffer, read out of buffer)
Lock contentionNone (lock-free SPSC)
Memory orderingAcquire/Release (minimal overhead)
LatencyDominated by TLS and WASM execution, not by the queue

The SPSC queue design means that the communication channel itself is essentially zero-overhead — the bottleneck is always the useful work (TLS handshake, HTTP parsing, WASM execution) rather than the data transport.