building trusted execution environments

rust
cryptography
nitro enclaves
aws
solana
evm
defi
nix
docker
ci/cd
linux

Polylayer is a trading protocol and platform for crypto, real-world assets, and prediction markets. From the user's perspective, the trading flow starts in their Phantom wallet, Solana side, and from there they can one-click trade across multiple venues and protocols. Beyond the interactive UI, the platform exposes a trading API and runs a server-side risk-management engine, so advanced orders, take-profits, stop-losses, scheduled rebalances, execute even when the user isn't logged in.

That product surface implies a constraint. The platform has to construct and sign EVM transactions on the user's behalf, across several chains, often while the user is asleep, and do that without ever taking custody of user funds. The user signs the Solana side themselves in Phantom; everything else, the per-user EVM addresses on Polygon and Arbitrum, the CCTP-bridged USDC, the Polymarket CTF positions, has to be signed server-side, but only when the user has explicitly authorized the specific action.

The standard solution to that constraint is a trusted execution environment (TEE): a hardware-isolated process that holds the sensitive key material and refuses to use it for anything outside a narrowly-defined policy. This post covers what a TEE is, why one was needed, what the available TEE-as-a-service products look like, and why the deployment ended up directly on AWS Nitro Enclaves after the platform options turned out to be either still in alpha or simply wrappers around the same underlying technology.

What a trusted execution environment is

A TEE is a chunk of compute, usually a CPU partition or a small VM, that the surrounding operating system cannot read from or write into. The hardware enforces the isolation. The contents of the TEE's memory aren't visible to the operating-system kernel, to the hypervisor, to other tenants on the same physical host, or to a privileged operator with root access on the host machine.

Isolation by itself isn't enough. A TEE also needs to be attestable: external parties have to be able to verify what program is running inside the TEE before they decide to trust it with secrets. The hardware vendor signs a statement (an attestation document) that contains a measurement of the running program, a hash, more or less, computed by the hardware itself during boot. Anyone who receives that document can verify the signature, look at the measurement, and decide whether the program matches what they expected.

The two properties together make a TEE useful for the signing-oracle shape. The signing key lives inside the TEE; the TEE proves to a key-release authority (here, AWS KMS) that it's running a specific, audited program; the key-release authority releases the wrapped key material only to a TEE running that exact program; the program inside the TEE then applies a policy, "only sign transactions matching a user-signed intent", that its auditable source code enforces. The operator of the host can't extract the key, because they can't read the TEE's memory. A different program can't extract the key either, because it would attest to a different measurement and the key-release authority would refuse.

The TEE landscape

For server workloads running in AWS, Nitro Enclaves are the practical option. Intel SGX is in the long fade-out; AMD SEV-SNP is hard to come by as a managed service; ARM TrustZone is the wrong technology for cloud-hosted servers. Nitro is the only one with a current published threat model, vendor support, ongoing patching, and an attestation chain rooted in a trusted authority (AWS) that the platform was already going to depend on for underlying infrastructure.

The Nitro attestation chain looks like this:

   AWS Nitro Root CA (out-of-band trust anchor)
        │ signs
        ▼
   per-instance Nitro attestation certificate
        │ signs
        ▼
   COSE_Sign1 attestation document
        │ contains
        ▼
   PCR0..PCR8  (image / kernel / boot measurements)
   user_data   (e.g., an ephemeral public key)
   pcr_signing (the enclave's instance identity)

Any party with the AWS Nitro root certificate can verify a Nitro attestation document, read out the PCR measurements, and decide whether to release secrets to the attesting enclave. This is the machinery on which everything downstream depends.

The provider survey

Several services offer Nitro Enclaves wrapped behind a developer- friendly facade. Four were evaluated:

   ┌─────────────────────────────────────────────────────────────────┐
   │  EigenCompute       Restaking-secured confidential compute.     │
   │                     Wraps Nitro under the hood. Decentralized   │
   │                     marketplace of operators, settlement and    │
   │                     slashing layer over EigenLayer.             │
   │                                                                 │
   │  Marlin Oyster      Open-source Nitro orchestration. Provides   │
   │                     a Nix flake reference design and a          │
   │                     decentralized control plane that schedules  │
   │                     jobs onto a network of operators.           │
   │                                                                 │
   │  Evervault          Closed-source. Hosted build pipeline and    │
   │                     billing layer over AWS Nitro Enclaves with  │
   │                     a developer-focused CLI.                    │
   │                                                                 │
   │  Fluence            Decentralized compute marketplace.          │
   │                     Confidential compute is one mode among      │
   │                     several; the backend technology depends on  │
   │                     the matched operator.                       │
   └─────────────────────────────────────────────────────────────────┘

Representative workloads ran on three of them, a small signing service shaped like a stripped-down version of what the final deployment looks like, and the fourth's documentation was read closely enough to decide it wouldn't fit. Two findings emerged from the evaluation. One was architectural; one was operational.

Architectural: the platforms are wrappers

Past the marketing layer, every platform's security boundary reduces to the same thing: AWS Nitro Enclaves running on AWS hardware, attested by the AWS Nitro root CA. The platform layers on top supply ergonomics, a build pipeline, sometimes a control plane, sometimes a settlement and slashing layer, but the cryptographic guarantees come from the underlying Nitro system. Measured boot, attestation signing, vsock-only I/O, the hypervisor isolation properties: all of these are properties of Nitro, not of the platform.

   ┌───────────────────────────────────────────────────────────────┐
   │  Application code                                             │
   ├───────────────────────────────────────────────────────────────┤
   │  Platform SDK / runtime / control plane (platform-specific)   │
   ├───────────────────────────────────────────────────────────────┤
   │  Enclave image format (EIF)                                   │
   ├───────────────────────────────────────────────────────────────┤
   │  nitro-cli                                                    │
   ├───────────────────────────────────────────────────────────────┤
   │  AWS Nitro Enclaves (vCPU + RAM partition, vsock channel)     │
   ├───────────────────────────────────────────────────────────────┤
   │  AWS Nitro System (hypervisor, attestation root CA)           │
   └───────────────────────────────────────────────────────────────┘

A platform deployment and a direct deployment offer identical cryptographic strength. The choice between them is not about security; it's about who operates the layer between Nitro and the application.

Operational: nobody publishes a contractual SLA

The platforms were attractive because they were supposed to abstract away operations. In practice, at the time of the evaluation, none of the four offered a contractual SLA covering either availability or cryptographic correctness at a level appropriate for live trading.

Beyond the absence of SLAs, each platform retained discretion over decisions that materially affected the deployment:

The choice of underlying Nitro instance class.
The cadence of base-image upgrades, which determine the PCR0 measurement, which determines the KMS key-release policy. A base-image change on the platform's side rotates the PCR0 out from under the application's KMS key policy, and decrypts silently start returning AccessDenied.
Whether the platform orchestrator may migrate the enclave between hosts on its own schedule.
The lifecycle of the parent-side vsock proxy, on which every AWS API call from the enclave depends.
The recovery procedure during an AWS regional event, since the platform was operating its own control plane.

None of those decisions appeared anywhere a customer could contract on. Each one represented operational risk additional to the risk already accepted by running on Nitro at all.

For a signing oracle that needs to be available during fast- moving market hours, those characteristics were disqualifying. A best-effort uptime promise from an alpha-stage platform is not a reasonable foundation for a service that signs withdrawals from prediction markets and bridges custodial USDC across chains.

The decision

The TEE was deployed directly on AWS Nitro Enclaves rather than through any of the platforms surveyed. The reasoning was mechanical: cryptographic guarantees are unchanged whether the deployment runs through a platform or directly on Nitro, because every platform reduces to Nitro at the security layer. The operational guarantees, on the other hand, get worse under a platform, because the platform interposes an operator whose decisions can't be contracted on. Removing the platform layer removes a source of uncompensated operational risk while preserving the security model in full.

The cost of the decision was the engineering work to build the deployment directly. That work is contained:

   ┌──────────────────────────────────────────────────────────────┐
   │  1. EC2 instance with Nitro Enclaves enabled.                │
   │  2. nitro-cli + nitro-enclaves-allocator installed on the    │
   │     parent.                                                  │
   │  3. EIF built from a Docker (or Nix) image.                  │
   │  4. Enclave boots, listens on vsock.                         │
   │  5. Parent runs vsock-proxy systemd units so the enclave can │
   │     reach AWS APIs.                                          │
   │  6. nginx + certbot for the HTTPS front, addressed at a      │
   │     Let's Encrypt-signed domain.                             │
   └──────────────────────────────────────────────────────────────┘

Each of those steps has subtleties that surface during the first implementation. None of them are research problems, they're configuration, integration, and operational discipline. Once the work was done, the runtime was fully under our control: no platform orchestrator would migrate the enclave at an inconvenient moment; no base-image rev from a third party would shift the PCR0 out from under the KMS policy; the SLA was whatever we chose to commit to internally. The risk profile shifted from opaque dependence on a platform team still figuring out its own product to transparent dependence on AWS plus a small amount of code under direct control.

The rest of this post describes the deployment.

The attestation gate

Most of what's interesting in the design is downstream of a single property of AWS KMS: a key policy can include a condition keyed on the enclave's attestation document.

   ┌───────────────────────────────────────────────────────────────┐
   │  KMS key policy (paraphrased):                                │
   │                                                               │
   │    Allow kms:Decrypt                                          │
   │    If    kms:RecipientAttestation:PCR0 == <approved EIF hash> │
   └───────────────────────────────────────────────────────────────┘

PCR0 is a measurement of the enclave image. Identical image bytes produce an identical PCR0; any modification, a single byte of source code, a different build toolchain, a base-image upgrade, produces a different PCR0. When the enclave calls KMS Decrypt and includes its attestation document, KMS verifies the document was signed by the Nitro root CA, reads the measured PCR0, and only releases plaintext if the PCR0 satisfies the policy condition.

The master mnemonic from which per-user EVM keys are derived is stored as ciphertext in an S3 bucket under this key. The ciphertext is accessible to anyone with S3 read permissions on the bucket, it isn't hidden, and doesn't need to be. The ciphertext on its own is useless. Only an enclave whose PCR0 satisfies the KMS policy can decrypt it.

       ┌────────────┐    ciphertext_blob     ┌────────────┐
       │     S3     │ ──────────────────────►│  Enclave   │
       │ sealed.bin │                        │  PCR0 = X  │
       └────────────┘                        └──────┬─────┘
                                                    │
                                                    │ Decrypt(blob,
                                                    │   attestation)
                                                    ▼
                                             ┌────────────┐
                                             │    KMS     │
                                             │  policy:   │
                                             │ X allowed? │
                                             └──┬──────┬──┘
                                            yes │      │ no
                                                ▼      ▼
                                          plaintext   AccessDenied
                                          to enclave  (parent root
                                                       also denied)

The parent EC2 instance has the IAM permission to call kms:Decrypt, but it cannot satisfy the policy condition: it cannot produce a valid Nitro attestation document, because attestation is a privileged operation accessible only to the running enclave's NSM (Nitro Security Module) device. The policy condition therefore protects the mnemonic from a compromised parent operator as well as from external attackers. Even an adversary with full root on the parent EC2 cannot decrypt the sealed mnemonic from S3.

Every subsequent design choice is downstream of this property. The remaining questions are: how does the attestation gate stay working through credential rotations, image rebuilds, and network configuration; and how does the program running behind the gate enforce a policy that the user can actually trust?

The enclave has no network

Nitro Enclaves don't get a network interface. The hypervisor exposes one I/O channel, vsock, a virtual socket addressed by a (context_id, port) tuple. The parent EC2 instance is always CID 3; the enclave's CID is chosen at launch time.

                ┌─────────────────────────────┐
                │   Parent EC2  (CID 3)       │
                │                             │
                │   eth0   ◄── internet ──►   │
                │   vsock interface           │
                └──────────┬─────────┬────────┘
                           │         │
                           │ vsock   │ vsock
                           ▼         ▼
                ┌─────────────────────────────┐
                │   Nitro Enclave  (CID 16)   │
                │                             │
                │   no eth0                   │
                │   only vsock I/O            │
                └─────────────────────────────┘

The lack of a network is a security feature. There's no NIC for an attacker to exfiltrate through, no socket the enclave can bind to listen on the public internet, no way for a compromise inside the enclave to reach the outside world directly. But the enclave still needs to call KMS to decrypt its own mnemonic, read sealed blobs from S3, and write session state to DynamoDB. Every one of those AWS API calls has to be tunneled through the parent.

AWS publishes a tool called vsock-proxy for the parent side. It listens on a vsock port inside the parent and forwards each TCP connection to a configured AWS endpoint. Inside the enclave, the AWS SDK is configured with endpoint_url overrides pointing at 127.0.0.1 on per-service ports; DNS resolution is bent via /etc/hosts so that AWS hostnames resolve to loopback; a small tokio task forwards each loopback connection to the corresponding vsock port.

   Inside the enclave:                       On the parent:

   AWS SDK KMS client                          vsock-proxy systemd unit
   └─► thinks it's calling                     └─► forwards vsock 3:8001
       kms.<region>.amazonaws.com:443              to the real
       └─► /etc/hosts redirects to                 kms.<region>
           127.0.0.1                                .amazonaws.com:443
           └─► endpoint_url is :8001
               └─► tokio bridge forwards
                   127.0.0.1:8001 to
                   vsock 3:8001  ────────────────────► outbound TLS to KMS

The TLS handshake remains end-to-end between the enclave and AWS. The parent sees only ciphertext on the wire. SNI carries the real AWS hostname, and AWS-issued certificates validate against that hostname. The routing is bent; the cryptography isn't.

Three implementation details about this setup are worth flagging, because each one is a load-bearing piece that's easy to miss the first time.

The first is that Docker treats /etc/hosts as a special path. A COPY instruction that places content there during build is silently replaced at runtime with a Docker-generated stub. The EIF builder honors that stub. The enclave's binary writes its /etc/hosts entries directly at PID-1 startup, before any AWS SDK client is initialized.

The second is that the loopback interface boots down. Nitro enclaves have no init system; nothing brings lo up by default. TCP connects to 127.0.0.1 return ENETUNREACH until the binary explicitly runs ip link set lo up. The Dockerfile includes iproute2 for the ip binary, and the enclave's binary executes that command early in startup.

The third is that three AWS services can't share 127.0.0.1:443 on loopback inside the enclave. The deployment uses port :8001 for KMS, :8002 for S3, and :8003 for DynamoDB. The AWS SDK accepts endpoint_url overrides cleanly; SNI still ships the real AWS hostname, so end-to-end TLS is unaffected by the non-standard port.

Getting the mnemonic into the enclave

The master mnemonic has to start somewhere. On the first ever boot of the deployment, when no sealed ciphertext exists in S3, the enclave generates a fresh BIP-39 mnemonic, encrypts it under the KMS key, and writes the ciphertext to S3.

        Enclave (first boot)              KMS             S3
              │
              │ generate mnemonic
              │ (24 BIP-39 words,
              │  256 bits entropy)
              │
              │── Encrypt(plaintext = mnemonic) ──►│
              │                                     │
              │◄────── ciphertext_blob ─────────────│
              │
              │── PutObject("master-seed.sealed") ─────────► │
              │                                              │
              │◄──────────── 200 OK ─────────────────────────│
              │
              │ keep plaintext in RAM,
              │ derive HKDF key store,
              │ start accepting signing requests

Encrypt isn't attestation-gated; any party with the IAM permission can encrypt under the key, including the parent operator. That asymmetry is intentional. Encrypting without already knowing the plaintext doesn't compromise anything, the parent never sees the plaintext that ends up inside the ciphertext, since the plaintext is generated inside the enclave and never leaves the enclave's address space in plaintext form.

The interesting case is every subsequent boot, when the sealed ciphertext already exists. The enclave needs to read its own mnemonic plaintext back into memory, and the KMS key policy needs to enforce the attestation condition on that read. The mechanism is Decrypt with a Recipient parameter: the enclave generates an ephemeral RSA keypair using the NSM device, obtains an attestation document over the public key, and supplies both to KMS along with the ciphertext. KMS verifies the attestation, looks up the measured PCR0 against the policy, decrypts the ciphertext if the PCR0 is allowed, and re-wraps the plaintext for the supplied public key in a CMS EnvelopedData blob.

   Enclave (subsequent boot)        NSM             KMS               S3

         │── GenerateKeyPair ────────►│
         │◄── pubkey (RSA, DER) ──────│
         │
         │── Attest(user_data = pubkey) ─►│
         │◄── attestation_doc ────────────│
         │     (COSE_Sign1, signed by
         │      Nitro root CA, embeds
         │      PCR0..PCR8 + pubkey)
         │
         │── GetObject("master-seed.sealed") ──────────────────────►│
         │◄────────────── ciphertext_blob ────────────────────────  │
         │
         │── Decrypt(ciphertext_blob,
         │           Recipient { attestation_doc, pubkey }) ─►│
         │
         │            KMS verifies:
         │              - attestation signed by Nitro root CA
         │              - PCR0 matches policy condition
         │              - on failure: AccessDenied
         │              - on success: re-wrap plaintext for pubkey
         │
         │◄── ciphertext_for_recipient (CMS EnvelopedData) ──│
         │
         │ local unwrap:
         │   1. BER → DER preprocess (KMS emits indefinite-
         │      length BER; the Rust CMS library requires DER)
         │   2. parse EnvelopedData, locate RecipientInfo
         │   3. RSA-OAEP-SHA256 unwrap of content-encryption key
         │      using the NSM private key
         │   4. AES-256-CBC decrypt of the inner content
         │   ─► mnemonic plaintext

The RSA keypair is fresh per boot. The private component exists only for the lifetime of the Decrypt call. There's no long- lived private material outside the mnemonic itself.

One step in this flow turned out to be more involved than it looks. The AWS KMS response for ciphertext_for_recipient uses indefinite-length BER encoding, legal under the ASN.1 spec but not accepted by the standard Rust CMS library, which only parses DER. A small preprocessor walks the BER stream, flattens indefinite-length containers, and joins constructed-form primitives into a single primitive value before handing the result to the CMS parser. Roughly three hundred lines of code, and the difference between an enclave that can read its own sealed mnemonic and one that can't.

Intent-bound signing

The enclave doesn't sign EVM transactions only. It also signs Solana transactions, including the ones that make one-click trading on Solana venues possible. The two paths share the same security model: every signature emitted by the enclave is bound to an intent that the user authorized with their Phantom wallet, and the validator inside the enclave refuses to sign anything whose payload doesn't match what the intent describes.

The signing endpoint accepts three inputs: an intent (canonical JSON the user signed with their Solana key), the ed25519 signature over that intent, and an unsigned transaction. The transaction is either an EVM EIP-1559 tx (for Polygon, Arbitrum, etc.) or a Solana versioned transaction (for Jupiter perps, polyleverage sessions, etc.). The action field on the intent tells the enclave which validator to run.

The intent envelope:

   Intent (canonical JSON, signed with the user's Solana key):

      {
        "action": "withdraw",
        "version": 1,
        "solana_pubkey": "<user's base58 pubkey>",
        "chain_id": 137,
        "value": "0",
        "expires_at": 1716130000,
        "recipient": "0x…",
        "amount": "1000000",
        "amount_max": "1000000"
      }

The action field selects one of a fixed set of validators. The EVM-side validators cover USDC withdrawals via CCTP v2, Polymarket-to-Solana withdrawals, Hyperliquid-to-Solana withdrawals, Polymarket CTF share redemptions, NegRisk redemptions, ERC-20 transfers, and Onramp wraps. The Solana-side validators cover Jupiter perpetuals (open, close, take-profit, stop-loss), polyleverage session operations, Polymarket order signing (off-chain L2 auth headers + on-chain order payloads), Hyperliquid order signing and bridge permits, and a small set of attestor signatures used for resolutions, price TWAPs, and liquidations.

Each validator is a separate piece of code that knows the exact program ID (for Solana) or contract address (for EVM) it expects, the exact instruction layout or function selector, and how to decode payload bytes into typed arguments that can be compared field-by-field against the intent.

The validation pipeline:

   ┌──────────────────────────────────────────────────────────────┐
   │  1. canonical(intent) → bytes                                │
   │  2. ed25519_verify(intent_sig, bytes, intent.solana_pubkey)  │
   │     failure ─► 401                                           │
   │                                                              │
   │  3. intent.expires_at < now ?    ─► 410                      │
   │                                                              │
   │  4. lookup validator for intent.action                       │
   │     unknown action ─► 400                                    │
   │                                                              │
   │  5. decode the unsigned payload                              │
   │       EVM path: decode EIP-1559 tx                           │
   │         check: tx.to    == expected contract                 │
   │                tx.value == intent.value                      │
   │                selector == expected function                 │
   │                decoded args == intent claims                 │
   │       Solana path: decode VersionedTransaction               │
   │         check: program_id == expected program                │
   │                instruction discriminator matches             │
   │                decoded accounts == intent claims             │
   │                decoded args     == intent claims             │
   │       any mismatch ─► 400 with field identification          │
   │                                                              │
   │  6. derive the per-user signing key for this action:         │
   │       EVM      ─►  HKDF salt = "polylayer-tee-v1"            │
   │       Solana   ─►  HKDF salt = "polylayer-jup-delegate-v1"   │
   │                    or         "polylayer-session-v1"         │
   │                    (Jupiter perps vs polyleverage session)   │
   │                                                              │
   │  7. sign the transaction                                     │
   │       EVM    : secp256k1 over keccak256(tx_rlp)              │
   │       Solana : ed25519   over the message bytes              │
   │  8. return { signed_tx, tx_hash, signer_address }            │
   └──────────────────────────────────────────────────────────────┘

Step 5 is the load-bearing step. The validator decodes the actual payload bytes the caller is asking to be signed, against the program or contract the action targets, and compares each decoded field against the corresponding field of the intent. The amount_max field provides an upper bound rather than an equality constraint, which accommodates slippage; every other field has to match exactly.

The trading lambda, the service that submits signing requests to the enclave, has no privileged authority. The only way to extract a signature is to present an intent the user signed, paired with a transaction whose decoded payload is consistent with that intent. A compromised lambda that tried to substitute a different recipient, a higher amount, a different program, or a different instruction would be rejected at step 5 with a 400 response indicating which field failed.

Per-user signing keys, both EVM and Solana, are derived deterministically from the master seed and the user's Solana pubkey. The same (solana_pubkey, master_mnemonic) always produces the same set of derived keys:

   EVM (per-user EOA, used for Polygon, Arbitrum, etc.):

   evm_privkey_v1(solana_pubkey) =
     HKDF-SHA256(
       ikm   = master_seed_bytes (64 bytes from BIP-39 seed),
       salt  = "polylayer-tee-v1" (UTF-8),
       info  = solana_pubkey_bytes (32 bytes raw),
       len   = 32
     )
   evm_address = keccak256(secp256k1_pubkey(evm_privkey))[12..]


   Solana (per-user delegate, used for Jupiter perps and
   polyleverage sessions; one keypair per (user, salt) pair):

   solana_delegate(solana_pubkey, salt) =
     HKDF-SHA256(
       ikm   = master_seed_bytes,
       salt  = salt,
       info  = solana_pubkey_bytes,
       len   = 32
     )
   used as the ed25519 seed; pubkey = ed25519_pubkey(seed).

The version string in each salt is a flag for future rotation. A v2 derivation would change the salt, derive different keys, and require an explicit migration. Everything in production today is v1.

Per-user delegates: Jupiter perps and polyleverage

EVM trading is the easy case. The TEE derives a per-user EOA from the master mnemonic, that EOA holds USDC at the user's request, the user signs intents that authorize specific transactions, and the TEE checks calldata + signs. The user never has to sign on-chain themselves after the initial intent, and the TEE never sees a private key outside the enclave.

Solana is shaped differently and needs a different approach. Solana programs natively support account-level delegation: an account owner can grant a specific delegate keypair authority to act on a token account or program-specific position. The platform leans into that primitive instead of fighting it.

The flow looks like this:

The TEE derives a per-user Jupiter delegate keypair (HKDF from the master seed + the user's Solana pubkey, with the Jupiter salt). The pubkey is exposed via /v1/jupiter/delegate?solana=<pubkey>; the private key never leaves the enclave.
The user does a one-time on-chain operation in Phantom that delegates a specific slice of their wallet authority to that delegate pubkey, say, a perps position account or a USDC token-account approval scoped to Jupiter.
The user registers a session in the TEE by signing a polylayer_session_register intent. The intent declares bounds:

   Intent (canonical JSON, signed with the user's Solana key):

      {
        "action": "jupiter_session_register",
        "version": 1,
        "solana_pubkey": "<user's base58 pubkey>",
        "per_intent_max_size_usdc": "10000000000",
        "cumulative_size_usdc_cap": "100000000000",
        "allowed_assets": ["SOL-PERP", "BTC-PERP"],
        "expires_at_slot": 350000000
      }

The enclave stores this in its SQLite session store. From then on, the TEE will only sign Jupiter txs that fit within those bounds, and atomically debits the cumulative counter on every successful sign.

To place a trade, the user (or the bearer-token API client acting on their behalf) sends an unsigned Jupiter tx + an intent describing what the tx should do. The enclave decodes the tx against the Jupiter program ID, checks every account and argument against the intent, checks the trade size against per_intent_max_size_usdc, atomically checks-and-debits the cumulative counter against cumulative_size_usdc_cap, and only then signs with the delegate. If the cumulative debit would exceed the cap, the enclave returns 409 and refuses to sign.

The point of the session-bounded model is that one-click trading and server-side automated execution work without trusting the trading lambda. If the lambda is fully compromised, the worst an attacker can do is drain up to the cumulative cap into trades within the allowed-asset set. They can't move funds to an attacker-controlled wallet, can't trade outside the whitelist, and can't extend the session expiry.

Polyleverage uses a session model with a different shape. Instead of in-wallet delegation, the user creates an on-chain Session PDA (via Phantom-signed CreateSession ix) and deposits funds into it. The Session PDA's authority is the polyleverage delegate keypair, which is what the TEE signs from. Everything else, per-intent caps, cumulative caps, allowed instruments, slot expiry, looks the same. Liquidation and resolution operations on polyleverage positions are signed by a shared attestor keypair inside the TEE (SolanaAttestorKeypair); the attestor only signs an attestation if the TEE can independently verify the breach mark or resolution outcome from on-chain state and external price feeds.

For programmatic clients, there's a parallel path that bypasses the per-trade ed25519 signature. The user creates a session with surface = "api" or surface = "both", and the TEE mints a bearer API key tied to that session. The key is held in an AES-256-GCM-encrypted escrow in DynamoDB; the lambda routes verify it against the TEE's session store on every call. With the bearer key, an external algo or trading bot can post /api/v1/jupiter/positions/{open,modify,close} requests directly, no Phantom prompt per trade. The same session bounds apply: every trade still gets validated and debited inside the enclave.

If a session is compromised or a key leaks, the user revokes on-chain (or via /v1/jupiter/session/revoke); a drain endpoint sweeps the delegate's funds back to the main wallet. The blast radius is one session's cumulative cap, not the user's whole wallet.

Hyperliquid and Polymarket: the signing surface

Hyperliquid is different again. HL orders aren't on-chain Solana or EVM transactions, they're signed actions submitted to HL's L1 sequencer over their HTTP API. The signature format is HL-specific: EIP-712-style typed data over their action structure, signed by the user's Arbitrum address (which is the per-user EVM key the TEE derives via HKDF).

The TEE has validators that decode HL's place, cancel-by-cloid, and modify action shapes, check the order parameters against the intent's claims (asset, side, size, price bounds, time-in-force), and sign with the per-user EVM key. The HL Bridge2 deposit path adds an EIP-2612 permit signature for batchedDepositWithPermit. The user pre-authorizes the deposit, the TEE signs the permit, and the cron submits it to Arbitrum once USDC has arrived at the user's EOA.

Polymarket order signing is also EIP-712 typed data, but the trade lifecycle requires more than just an order signature. Polymarket's V2 CLOB authenticates clients with per-user API keys + HMAC L2 headers on every request. Those credentials are minted by the TEE on first use and never leave the enclave; the TEE assembles each request's L2 headers internally and returns them alongside the signed order. The lambda forwards both to the CLOB. Polymarket is the venue where AOE-driven conditional orders are most mature today: a fleet of per-user Fly.io workers subscribes to the Polymarket WebSocket, evaluates triggers (sustained price hits, watchlist ranks, market flips) against live order book updates, and fires placement or cancellation actions through the same TEE-signed order path.

For Jupiter and polyleverage, the TEE-side primitives are in place: per-user delegates, session bounds, atomic cumulative debits, sign endpoints, drain and revoke. The orchestration layer that watches positions and fires conditional orders for those venues is still being built out. The signing primitive is venue-agnostic; what's left is wiring action handlers in the AOE worker and a margin/liquidation monitoring loop for polyleverage.

Keeping the credentials fresh

The AWS SDK clients inside the enclave need credentials. EC2 instances ordinarily obtain credentials from IMDS at 169.254.169.254, but the enclave can't reach IMDS, there's no network interface, same as everywhere else network-shaped. A small daemon on the parent, imds-bridge.py, listens on vsock 3:9100; on each accept it reads fresh STS credentials from IMDSv2 and writes them, alongside the API bearer token for the trading lambda, as one JSON object to the enclave.

       ┌────────────────────────────────────────────────┐
       │  imds-bridge.py  (on the parent)               │
       │                                                │
       │  listen on vsock 3:9100                        │
       │                                                │
       │  on accept:                                    │
       │     read STS creds from IMDSv2                 │
       │     write JSON { access_key_id,                │
       │                  secret_access_key,            │
       │                  session_token,                │
       │                  expires_at,                   │
       │                  admin_token }                 │
       │     close                                      │
       └────────────────────────────────────────────────┘

STS instance-role credentials expire after about six hours. The first version of the deployment fetched once at boot and treated the credentials as static. That worked, in a strict sense, the system was correct, but every six hours the AWS SDK would start returning ExpiredToken, the enclave's systemd unit would hit its Restart=on-failure handler, and a fresh fetch would recover. The behavior was reliable but it generated an alert on every restart cycle, and the alert was real enough that an operator had to look at it before clearing it.

The current implementation registers a ProvideCredentials adapter with the AWS SDK's identity cache, configured with a one-hour buffer time. The adapter performs a fresh vsock round-trip on each invocation:

       ┌────────────────────────────────────────────────┐
       │  Boot                                          │
       │                                                │
       │  VsockCredsProvider.fetch()                    │
       │     ───►  credentials valid for ~6h            │
       │  identity_cache schedules refresh at           │
       │     (expires_at - 1h)                          │
       └───────────────────────┬────────────────────────┘
                               │
                               │  ~5 hours later
                               ▼
       ┌────────────────────────────────────────────────┐
       │  identity_cache fires                          │
       │                                                │
       │  VsockCredsProvider.provide_credentials()      │
       │     ───►  fresh credentials                    │
       │  identity_cache swaps the cached value         │
       │  in-flight requests retry transparently        │
       └────────────────────────────────────────────────┘

The visible effect is that the deployment no longer restarts on credential expiry. About a hundred lines of Rust, and an operational paper cut that had been present since the first deploy was gone.

A real domain and a real cert

For the first few weeks of operation, the enclave was addressed by raw IP, with nginx terminating TLS using a self-signed certificate. Both clients, the trading lambda and the Next.js-side helpers, had to opt out of certificate validation on every TEE call:

       const dispatcher = new Agent({
         connect: { rejectUnauthorized: false },   // ← not ideal
       });

That worked, but it left a rejectUnauthorized: false in two production codepaths, the kind of thing that survives in a codebase for years and confuses every code reviewer who comes across it. The cleaner setup, once a domain was available, was to point tee.polylayer.xyz at the parent's elastic IP and issue a Let's Encrypt certificate against it.

The mechanics are standard:

   1. Route 53 A record:    tee.polylayer.xyz → <elastic-ip>
   2. nginx vhost on the parent:
        :80 location /.well-known/acme-challenge/ for ACME
        :443 SSL, proxying to localhost:8080 (the socat bridge)
   3. certbot --nginx -d tee.polylayer.xyz
        verifies via HTTP-01 over port 80
        writes /etc/letsencrypt/live/tee.polylayer.xyz/{cert,key}
        rewrites the :443 vhost to point at the issued cert
   4. systemctl enable certbot-renew.timer
        fires twice daily; renews any cert within 30 days of
        expiry; reloads nginx after each successful renewal

One detail worth flagging is that port 80 has to remain reachable from the public internet for the lifetime of the deployment. Let's Encrypt's HTTP-01 challenge protocol is plain HTTP; there is no alternative transport for it. If port 80 were closed at the security group, renewal would silently fail and the cert would expire eighty-nine days later. The CDK security group exposes :80 to 0.0.0.0/0 for exactly this reason. Outside the challenge window, the :80 vhost issues a 301 redirect to HTTPS, so there's no useful surface for a probing attacker.

A small CDK detail surfaced during the setup: AWS security- group rule descriptions don't accept apostrophes. The original description Let's Encrypt HTTP-01 challenge was rejected by the EC2 API; the description in production reads Lets Encrypt HTTP-01 challenge. Not a security property, just a footnote.

Reproducible builds

The KMS policy is keyed on PCR0, which is a hash of the EIF. Reproducibility matters because PCR0 is also the only way an external reviewer can verify that the production enclave is running the code published in source form. The reviewer rebuilds the code from source on their own machine, computes the PCR0, and compares it against the value in the KMS policy. That comparison is meaningful only if the build is deterministic; a build that produces different bytes on different machines makes the verification chain useless.

The deployment supports two build paths:

   Path A: Docker     fast iteration on a single Linux host
                      not reproducible across machines
                      glibc base image, no SOURCE_DATE_EPOCH
                      current production PCR0: f6f4850c…

   Path B: Nix flake  reproducible
                      SOURCE_DATE_EPOCH pinned, lockfile-pinned
                      nixpkgs + rust-overlay, single codegen
                      unit, deterministic linker
                      Nix-built PCR0: eb6a0bed…

Production currently runs the Docker-built image. The Nix-built image exists, is verified, and is a single-line ENCLAVE_PCR0=… CDK redeploy away from being the live PCR0. That swap is scheduled for the next release window with no in-flight signing volume.

Getting the Nix build to a deterministic state took longer than expected. Pure-evaluation mode forbids reading paths outside the flake's directory, which meant flake.nix had to move from the nix/ subdirectory up to the workspace root so that it could reference ./Cargo.lock without escaping its source slice. The default rustPlatform provided by nixpkgs uses an older rustc than the workspace's 1.92 minimum required version; a custom rustPlatform constructed via makeRustPlatform was required. The cargoBuildHook adds --profile release automatically, which conflicts with any explicit --release in cargoBuildFlags. Each of these was a small fix, but the rebuild cycle on a c6i.xlarge is long enough that they accumulated.

Two consecutive nix build .#docker-image invocations on the same host produced byte-identical outputs at every level:

Artifact	Hash
Nix store path	`/nix/store/fral84b7hjss6pbh1m1iw44l6mv3v3ah-polylayer-tee.tar.gz`
Docker image	`sha256:b9dc633ff8fe…7deac9cf318d8fa5c`
EIF PCR0	`eb6a0bed1c629a…f941a098ab71b1f18583d2092f2f58`

That demonstrates same-host reproducibility. Cross-machine reproducibility, an independent verifier on a clean Nix install rebuilding from the same flake and arriving at the same PCR0, has not yet been verified. The flake.lock pins every input by content hash, which is necessary but not sufficient. The verification is straightforward and is scheduled work.

Upgradeability and the CI/CD path

The deployment is upgradeable, but the upgrade shape is constrained by attestation. Every byte of source code change produces a new PCR0; every new PCR0 needs to be in the KMS key policy before the resulting enclave can decrypt the sealed mnemonic. The KMS policy supports two PCR0s at a time (enclavePcr0 + enclavePcr0Previous), so a rolling deploy can keep the old enclave decrypting while the new one is verified. Upgrades are zero-downtime as long as the deploy follows that ordering.

The kinds of changes that actually need a TEE upgrade are narrow. SDK version bumps on the venue side don't reach the enclave at all, since the enclave only sees the cryptographic envelope (function selector + decoded args for EVM, instruction discriminator + decoded accounts for Solana). What does require an upgrade is: a contract address change, a function selector change, an EIP-712 or L1-action struct shape change, a new venue, or a new action type. Anything else is pure lambda-side work and deploys in minutes without touching the TEE.

CI lives in a single GitHub Actions workflow at .github/workflows/tee-eif.yml. On every push or pull request that touches eigen-tee-rust/, the workflow runs:

   ┌──────────────────────────────────────────────────────────────┐
   │  job: test                                                   │
   │    cargo test --workspace --locked                           │
   │    cargo clippy --workspace -- -D warnings                   │
   │                                                              │
   │  job: nix-build  (needs: test)                               │
   │    nix build .#docker-image                                  │
   │    capture Nix store path + Docker tarball sha256            │
   │    upload tarball as artifact                                │
   │    write summary table with both hashes                      │
   └──────────────────────────────────────────────────────────────┘

The unit tests catch validator decoding bugs (intent fields decoded against the wrong calldata offset, etc.), intent canonicalization regressions (whitespace or key-ordering drift that would break Phantom-signed intents), and HKDF / signing fixture parity. The Nix build catches reproducibility regressions: if a dependency update breaks deterministic compilation, the next CI run notices, not the next deploy. The Docker tarball is uploaded as a workflow artifact so the parent can curl it instead of rebuilding from scratch.

The actual EIF assembly step still happens on the parent. CI doesn't produce an EIF because nitro-cli build-enclave needs the nitro-enclaves CLI toolchain, which is straightforward on Amazon Linux but awkward on a generic GitHub-hosted Ubuntu runner. Keeping that step on the parent also has a useful property: the same Linux host that will run the enclave is the one that produced the EIF, so there's no cross-host reproducibility caveat in the deploy path.

A script orchestrates the operator-side cutover:

   ┌──────────────────────────────────────────────────────────────┐
   │  1. nix build .#docker-image       (on operator's box)       │
   │  2. docker load + tag                                        │
   │  3. ssh parent:                                              │
   │       nitro-cli build-enclave                                │
   │       nitro-cli describe-eif       → captures new PCR0       │
   │  4. PAUSE, operator updates CDK:                             │
   │       ENCLAVE_PCR0=<new>                                     │
   │       ENCLAVE_PCR0_PREVIOUS=<current production>             │
   │       cd eigen-tee-rust/cdk && npm run deploy                │
   │  5. ssh parent:                                              │
   │       systemctl restart polylayer-tee-enclave                │
   │  6. curl https://tee.polylayer.xyz/healthz                   │
   └──────────────────────────────────────────────────────────────┘

The reason step 4 is preserved as a human pause rather than fully automated is policy. The KMS attestation gate is the linchpin of the entire security model; the principle that "the gate protecting everything else shouldn't be tunable by the same pipeline that produces the thing it's gating" is worth keeping. Automating it would mean handing the CI workflow IAM permission to update KMS key policies, which collapses two independent trust boundaries into one. The manual step is a human eyeballing two PCR0 hashes for equality, the new one from nitro-cli describe-eif against the new one CI computed during the Nix build. Five seconds, and it preserves the property that "the same actor cannot both build the enclave and bless it."

End-to-end deploy time for a small change is well under fifteen minutes. The Rust build with lto=fat is the slowest step at five to seven minutes on a c6i.xlarge; CI runs the same build in parallel, so by the time the parent finishes its build, CI's PCR0 prediction is already in the pull request's checks. Once cdk deploy is in and the enclave restarts, the new image is serving in about thirty seconds (the allocator already has memory and vCPUs reserved).

The end-to-end picture

Flattened into one diagram, the deployment looks like this. It is busy but it covers every component the system depends on:

                ┌─────────────────────────┐
                │  Browser (Phantom)      │
                │  signs intent with      │
                │  user's Solana key      │
                └────────────┬────────────┘
                             │ HTTPS POST { intent,
                             │              intent_sig,
                             │              unsigned_tx_hex }
                             ▼
                ┌─────────────────────────┐
                │  Trading Lambda         │
                │  (tokyo, sg, fra, lon)  │
                │  forwards to TEE        │
                └────────────┬────────────┘
                             │ HTTPS bearer
                             │ https://tee.polylayer.xyz
                             ▼
                ┌─────────────────────────┐
                │  nginx :443             │
                │  Let's Encrypt cert     │
                │  on parent EC2          │
                └────────────┬────────────┘
                             │ socat
                             │ vsock 16:8080
                             ▼
        ┌────────────────────────────────────────┐
        │  Nitro Enclave  (PCR0-measured EIF)    │
        │                                        │
        │  · verify intent_sig (ed25519)         │
        │  · check intent.expires_at             │
        │  · match intent.action → validator     │
        │  · decode payload, compare each field  │
        │    to intent's claims                  │
        │  · HKDF-derive per-user key            │
        │      EVM    : secp256k1 (Polygon/Arb)  │
        │      Solana : ed25519   (Jup / session)│
        │  · sign tx                             │
        │                                        │
        │  ───  outbound (via parent vsock) ───  │
        │                                        │
        │  · KMS: attestation-gated Decrypt of   │
        │    sealed mnemonic from S3             │
        │  · refresh STS creds every ~5h         │
        │  · GET/PUT user session blobs in DDB   │
        └──────────┬────────────────────────┬────┘
                   │                        │
                   │ signed EVM tx          │ signed Solana tx
                   ▼                        ▼
        ┌─────────────────────┐  ┌─────────────────────┐
        │  Polygon / Arbitrum │  │  Solana RPC         │
        │  eth_sendRawTx      │  │  sendTransaction    │
        └─────────────────────┘  └─────────────────────┘

Reading top to bottom: the user signs an intent in their Phantom wallet, the trading lambda forwards it together with the unsigned transaction (EVM or Solana), nginx terminates TLS at the parent EC2, socat carries the request the rest of the way over vsock, the enclave validates the intent and the payload, derives the appropriate per-user key, and signs. EVM transactions go back out to Polygon or Arbitrum; Solana transactions (Jupiter perps, polyleverage session orders, attestor signatures) go back out to a Solana RPC. The enclave's outbound AWS calls, KMS for attestation-gated decryption, S3 for the sealed ciphertext, DynamoDB for user session blobs, all exit through parent-side vsock-proxy units back to AWS, with end-to-end TLS preserved to the real AWS endpoints. The whole deployment runs on one EC2 instance and costs under a couple of hundred dollars a month at current volumes.