execution backend for verl · openrlhf · trl · vpc-first · built in rust

Fork rollouts.
Replay failures.
Train from real states.

Arbor is the checkpoint-native execution backend for agentic RL frameworks. Give your VeRL / OpenRLHF / TRL rollout workers statistically independent environments — branch-safe fork, per-step trajectory collection, VPC-first credential brokering — without leaving your cluster.

Get started → View on GitHub

rollout_benchmark.sh

# Fork 8 isolated rollouts from one bug-repro checkpoint.
# Collect trajectories, attribute reward, export winning patch.

arbor run-benchmark swebench-lite \
  --models claude-opus-4,claude-sonnet-4 \
  --forks 8 \
  --checkpoint repo@HEAD \
  --reward "cargo test --test integration"

# attempt-0  ✅  47 tool calls · $0.23 · 4m12s  ← patch exported
# attempt-1  ❌  61 tool calls · $0.31 · 6m08s  ← replay available
# attempt-2  ❌  38 tool calls · $0.18 · 3m44s
# ...                                             ← trajectories → s3://

arbor replay attempt-1 --from-step 20   # rewind to failure point
arbor diff attempt-0 attempt-3          # compare strategies cross-fork

// core differentiators

branch-safe restore

Firecracker warns that restoring the same checkpoint twice gives both VMs identical PRNG seeds, token caches, and SSH state. Arbor solves this with a quarantine + reseal protocol enforced at the infrastructure level — no application coordination required.

vpc-first credentials

The VM never receives your API keys. The host-side egress proxy injects credentials at the network boundary. Agents see OPENAI_API_KEY=arbor-brokered. Prompt injection, supply chain attacks, and environment leaks are structurally impossible.

rollout dag

Every checkpoint records its parent, forming a DAG of execution history across all rollout attempts. Fork N agents from the same start state, let them explore independently, diff strategies cross-fork, and merge the winner — the execution graph your reward model actually needs.

trajectory tracer M10

Per-fork traces: prompt, tool calls, shell commands, file diffs, test results, network access, token cost, wall time. Reward attribution traces which steps contributed to the final outcome. Cross-fork diff shows where strategies diverged. Export winning trajectories as JSONL for fine-tuning. Replay any failure from any step.

default-deny egress

Each workspace lives in its own Linux network namespace. The only route out is through the host-side proxy. Egress is physically impossible to bypass — there's no route, not just a rule. nftables enforces the allowlist at the kernel level.

self-hostable

The entire control plane, runner pool, and egress proxy run inside your VPC. Code, secrets, and agent activity never leave your network. Docker Compose for dev, Helm chart for production Kubernetes. Runner agents run on bare-metal KVM hosts and self-register.

sub-150ms boot

Built on Firecracker, the same microVM technology AWS uses for Lambda. Full VM isolation — not containers — with boot times competitive with container startups. Firecracker's Jailer provides an additional seccomp/cgroup isolation layer.

multi-runner pool

Scale across any number of bare-metal KVM hosts. Runner agents self-register, send heartbeats every 15 seconds, and drain gracefully on SIGTERM. The scheduler places workspaces on the least-loaded compatible runner. x86_64 and ARM64/Graviton2 pools are supported.

arm64 support

Native ARM64/Graviton2 runner class (fc-arm64-v1) with T2A CPU template enforcement. The control plane rejects misconfigured runners at registration with RUNNER_ARCH_MISMATCH. x86_64 and aarch64 workspaces are strictly isolated — snapshots can never cross architectures.

gpu inference

GPU-capable workspaces without VFIO passthrough. Workspaces call gpu.local; the egress proxy rewrites requests to a local sidecar (llama.cpp / vLLM / Ollama) and injects the model header host-side. GPU model weights and access tokens never enter the microVM. Runner classes fc-gpu-x86_64-v1 and fc-gpu-arm64-v1 report GPU capacity at registration; the scheduler enforces free GPU slot availability.

// architecture

THREE-LAYER ISOLATION · MULTI-RUNNER POOL

LAYER 1 — CONTROL PLANE (kubernetes / docker compose)

arbor-api
REST · WebSocket PTY · Scheduler · GET /metrics · axum

arbor-controller
State machine · Fork · Reseal · sqlx/postgres

↓ runner pool (HTTP · heartbeat every 15s) ↓

LAYER 2 — RUNNER POOL (bare-metal KVM hosts)

runner-1 · x86_64

fc-x86_64-v1 · T2
Firecracker + Jailer
GET /metrics · PUT /drain

runner-2 · x86_64

fc-x86_64-v1 · T2
Firecracker + Jailer
GET /metrics · PUT /drain

runner-3 · arm64

fc-arm64-v1 · T2A
Firecracker + Jailer
GET /metrics · PUT /drain

↓ per-workspace VM isolation ↓

LAYER 3 — HOST CONTROLS

arbor-egress-proxy

Allowlist · Credential injection · hyper

arbor-snapshot

Checkpoint DAG · S3/MinIO · sha256

arbor-secret-broker

Grant lifecycle · Vault integration

arbor-api — REST + WebSocket + runner pool

arbor-controller — state machine + scheduler

arbor-runner-agent — VM lifecycle + Prometheus + drain

arbor-guest-agent — musl binary in VM (x86_64 + arm64)

arbor-snapshot — checkpoint manifests

arbor-egress-proxy — CONNECT proxy

arbor-secret-broker — grant lifecycle

arbor-common — shared types + errors

// how fork safety works

reseal_protocol.rs

// Every fork goes through quarantine + reseal before READY.
// Enforced at infrastructure level. Zero app coordination needed.

fork(checkpoint_id)
  └─ new VM boots in QUARANTINED state
      ├─ all egress blocked
      ├─ all attach tokens invalidated
      └─ reseal hook chain runs:
              1. bump identity_epoch   →  new VM identity
              2. rotate session tokens
              3. re-sign preview URLs
              4. revoke + re-issue secret grants
              5. re-seed guest entropy via vsock
              ─────────────────────────────────
              state → READY

STEP 01

identity epoch

Each fork gets a new identity epoch, preventing session tokens from crossing workspace boundaries.

STEP 02

token rotation

All attach tokens from the parent snapshot are invalidated. New tokens are issued for the forked workspace only.

STEP 03

grant re-issue

Secret grants are revoked and re-issued with fresh IDs. The egress proxy registry is updated atomically.

STEP 04

entropy reseed

Guest PRNG is re-seeded via vsock, eliminating the shared-entropy correctness bug Firecracker warns about.

On credential brokering: when an agent calls api.openai.com, traffic flows: agent process → VM netns (blocked by default) → host TAP device → arbor-egress-proxy → allowlist check → credential injection → upstream. The VM never receives the credential value. Even if the agent logs its environment, leaks it to a supply-chain compromise, or is manipulated by prompt injection — the real key was never there.

// for rl training frameworks

Arbor is designed to be dropped in as the rollout environment backend for any agentic RL framework. Replace ad-hoc subprocess or Docker spawning with a backend that gives your training loop what it actually needs: statistically independent forks, per-step traces, and trajectory export — all in your VPC.

VeRL OpenRLHF TRL (HuggingFace) custom rollout workers

PROBLEM

Correlated rollouts

Naive snapshot reuse gives N forks identical PRNG seeds → correlated outputs → biased advantage estimates in GRPO / PPO group sampling.

→ Branch-safe reseal: fresh entropy per fork

PROBLEM

No per-step credit assignment

You can see if the test passed, not which tool call caused the failure or which fork's strategy was better. Attribution and replay are impossible.

→ M10 trajectory tracer: prompt → tool → diff → reward

PROBLEM

Training data leaves your VPC

Every existing coding sandbox is SaaS-only. Rollout trajectories collected for fine-tuning can't touch a third-party cloud for proprietary codebases.

→ Fully self-hosted: Helm chart, runs in your k8s cluster

verl_rollout_worker.py — VeRL integration sketch

# Drop into your VeRL rollout worker — replace subprocess/Docker with Arbor forks.
import httpx, asyncio

ARBOR = "http://arbor-api:8080"

async def rollout_batch(checkpoint_id: str, n: int, reward_cmd: str):
    # Fork N isolated environments from one snapshot.
    # Each fork gets fresh entropy, session tokens, and secret grants.
    forks = await asyncio.gather(*[
        client.post(f"{ARBOR}/v1/checkpoints/{checkpoint_id}/fork",
                   json={"branch_name": f"attempt-{i}",
                         "post_restore": {"quarantine": True, "identity_reseal": True}})
        for i in range(n)
    ])

    # Run agent policy in each fork; Arbor records per-step traces automatically.
    results = await asyncio.gather(*[
        run_agent(fork["workspace_id"], reward_cmd) for fork in forks
    ])

    # Export trajectories as JSONL → feed into PPO/GRPO advantage estimation.
    trajectories = await asyncio.gather(*[
        client.get(f"{ARBOR}/v1/workspaces/{r['workspace_id']}/trajectory")
        for r in results
    ])
    return trajectories   # statistically independent — no shared entropy

Feature	Arbor	E2B	Docker Sandboxes	Modal	Daytona
VM-level isolation	✓ Firecracker	✓ Firecracker	—	— container	— mixed
Fork from checkpoint	✓ first-class API	✗	✗	✗	✗
Branch-safe restore	✓ unique	✗	✗	✗	✗
Credential brokering	✓ host-side proxy	✗	~ partial	✗	✗
Default-deny egress	✓	~ partial	✓	✗	✗
Self-host / VPC-first	✓ first-class	✗ SaaS only	✗ SaaS only	✗ SaaS only	✓
Multi-runner pool + Helm	✓	✗	✗	~ partial	~ partial
ARM64 / Graviton2	✓ fc-arm64-v1	✗	✗	✓	✗
GPU inference (host-mediated)	✓ fc-gpu-*-v1	✗	~ VFIO only	✗	✗
Sub-150ms boot	✓	✓	✗	✓	✗
Open source	✓ MIT / Rust	~ SDK only	✗	✗	✓

// get started

Up in five minutes.

STEP 01

clone & configure

git clone github.com/Billy1900/Arbor
cd Arbor
cp deploy/.env.example deploy/.env

STEP 02

start the stack

make docker-up

# API:    localhost:8080
# Metrics: localhost:8080/metrics
# MinIO:  localhost:9001

STEP 03

register a runner

make register-dev-runner

# x86_64 → fc-x86_64-v1 T2
# ARM64  → register-dev-runner-arm64

STEP 04

run the fork demo

make demo-fork

# Creates workspace,
# snapshots, forks 3 branches

PRODUCTION / KUBERNETES
helm install arbor deploy/helm/arbor --namespace arbor --create-namespace \
--set api.config.databaseUrl="postgresql://..." \
--set api.config.attachTokenSecret="$(openssl rand -hex 32)"

GitHub → Read the intro

Fork rollouts.Replay failures.Train from real states.

Up in five minutes.

clone & configure

start the stack

register a runner

run the fork demo

Fork rollouts.
Replay failures.
Train from real states.