execution backend for verl · openrlhf · trl · vpc-first · built in rust

Fork rollouts.
Replay failures.
Train from real states.

Arbor is the checkpoint-native execution backend for agentic RL frameworks. Give your VeRL / OpenRLHF / TRL rollout workers statistically independent environments — branch-safe fork, per-step trajectory collection, VPC-first credential brokering — without leaving your cluster.

Get started → View on GitHub
rollout_benchmark.sh
# Fork 8 isolated rollouts from one bug-repro checkpoint.
# Collect trajectories, attribute reward, export winning patch.

arbor run-benchmark swebench-lite \
  --models claude-opus-4,claude-sonnet-4 \
  --forks 8 \
  --checkpoint repo@HEAD \
  --reward "cargo test --test integration"

# attempt-0  ✅  47 tool calls · $0.23 · 4m12s  ← patch exported
# attempt-1  ❌  61 tool calls · $0.31 · 6m08s  ← replay available
# attempt-2  ❌  38 tool calls · $0.18 · 3m44s
# ...                                             ← trajectories → s3://

arbor replay attempt-1 --from-step 20   # rewind to failure point
arbor diff attempt-0 attempt-3          # compare strategies cross-fork
// core differentiators
01
branch-safe restore

Firecracker warns that restoring the same checkpoint twice gives both VMs identical PRNG seeds, token caches, and SSH state. Arbor solves this with a quarantine + reseal protocol enforced at the infrastructure level — no application coordination required.

02
vpc-first credentials

The VM never receives your API keys. The host-side egress proxy injects credentials at the network boundary. Agents see OPENAI_API_KEY=arbor-brokered. Prompt injection, supply chain attacks, and environment leaks are structurally impossible.

03
rollout dag

Every checkpoint records its parent, forming a DAG of execution history across all rollout attempts. Fork N agents from the same start state, let them explore independently, diff strategies cross-fork, and merge the winner — the execution graph your reward model actually needs.

10
trajectory tracer M10

Per-fork traces: prompt, tool calls, shell commands, file diffs, test results, network access, token cost, wall time. Reward attribution traces which steps contributed to the final outcome. Cross-fork diff shows where strategies diverged. Export winning trajectories as JSONL for fine-tuning. Replay any failure from any step.

04
default-deny egress

Each workspace lives in its own Linux network namespace. The only route out is through the host-side proxy. Egress is physically impossible to bypass — there's no route, not just a rule. nftables enforces the allowlist at the kernel level.

05
self-hostable

The entire control plane, runner pool, and egress proxy run inside your VPC. Code, secrets, and agent activity never leave your network. Docker Compose for dev, Helm chart for production Kubernetes. Runner agents run on bare-metal KVM hosts and self-register.

06
sub-150ms boot

Built on Firecracker, the same microVM technology AWS uses for Lambda. Full VM isolation — not containers — with boot times competitive with container startups. Firecracker's Jailer provides an additional seccomp/cgroup isolation layer.

07
multi-runner pool

Scale across any number of bare-metal KVM hosts. Runner agents self-register, send heartbeats every 15 seconds, and drain gracefully on SIGTERM. The scheduler places workspaces on the least-loaded compatible runner. x86_64 and ARM64/Graviton2 pools are supported.

08
arm64 support

Native ARM64/Graviton2 runner class (fc-arm64-v1) with T2A CPU template enforcement. The control plane rejects misconfigured runners at registration with RUNNER_ARCH_MISMATCH. x86_64 and aarch64 workspaces are strictly isolated — snapshots can never cross architectures.

09
gpu inference

GPU-capable workspaces without VFIO passthrough. Workspaces call gpu.local; the egress proxy rewrites requests to a local sidecar (llama.cpp / vLLM / Ollama) and injects the model header host-side. GPU model weights and access tokens never enter the microVM. Runner classes fc-gpu-x86_64-v1 and fc-gpu-arm64-v1 report GPU capacity at registration; the scheduler enforces free GPU slot availability.

// architecture
THREE-LAYER ISOLATION · MULTI-RUNNER POOL
LAYER 1 — CONTROL PLANE (kubernetes / docker compose)
arbor-api
REST · WebSocket PTY · Scheduler · GET /metrics · axum
arbor-controller
State machine · Fork · Reseal · sqlx/postgres
↓ runner pool (HTTP · heartbeat every 15s) ↓
LAYER 2 — RUNNER POOL (bare-metal KVM hosts)
runner-1 · x86_64
fc-x86_64-v1 · T2
Firecracker + Jailer
GET /metrics · PUT /drain
runner-2 · x86_64
fc-x86_64-v1 · T2
Firecracker + Jailer
GET /metrics · PUT /drain
runner-3 · arm64
fc-arm64-v1 · T2A
Firecracker + Jailer
GET /metrics · PUT /drain
↓ per-workspace VM isolation ↓
LAYER 3 — HOST CONTROLS
arbor-egress-proxy
Allowlist · Credential injection · hyper
arbor-snapshot
Checkpoint DAG · S3/MinIO · sha256
arbor-secret-broker
Grant lifecycle · Vault integration
arbor-api — REST + WebSocket + runner pool
arbor-controller — state machine + scheduler
arbor-runner-agent — VM lifecycle + Prometheus + drain
arbor-guest-agent — musl binary in VM (x86_64 + arm64)
arbor-snapshot — checkpoint manifests
arbor-egress-proxy — CONNECT proxy
arbor-secret-broker — grant lifecycle
arbor-common — shared types + errors
// how fork safety works
reseal_protocol.rs
// Every fork goes through quarantine + reseal before READY.
// Enforced at infrastructure level. Zero app coordination needed.

fork(checkpoint_id)
  └─ new VM boots in QUARANTINED state
      ├─ all egress blocked
      ├─ all attach tokens invalidated
      └─ reseal hook chain runs:
              1. bump identity_epoch     new VM identity
              2. rotate session tokens
              3. re-sign preview URLs
              4. revoke + re-issue secret grants
              5. re-seed guest entropy via vsock
              ─────────────────────────────────
              state  READY
STEP 01
identity epoch

Each fork gets a new identity epoch, preventing session tokens from crossing workspace boundaries.

STEP 02
token rotation

All attach tokens from the parent snapshot are invalidated. New tokens are issued for the forked workspace only.

STEP 03
grant re-issue

Secret grants are revoked and re-issued with fresh IDs. The egress proxy registry is updated atomically.

STEP 04
entropy reseed

Guest PRNG is re-seeded via vsock, eliminating the shared-entropy correctness bug Firecracker warns about.

On credential brokering: when an agent calls api.openai.com, traffic flows: agent process → VM netns (blocked by default) → host TAP device → arbor-egress-proxy → allowlist check → credential injection → upstream. The VM never receives the credential value. Even if the agent logs its environment, leaks it to a supply-chain compromise, or is manipulated by prompt injection — the real key was never there.

// for rl training frameworks

Arbor is designed to be dropped in as the rollout environment backend for any agentic RL framework. Replace ad-hoc subprocess or Docker spawning with a backend that gives your training loop what it actually needs: statistically independent forks, per-step traces, and trajectory export — all in your VPC.

VeRL OpenRLHF TRL (HuggingFace) custom rollout workers
PROBLEM
Correlated rollouts

Naive snapshot reuse gives N forks identical PRNG seeds → correlated outputs → biased advantage estimates in GRPO / PPO group sampling.

→ Branch-safe reseal: fresh entropy per fork
PROBLEM
No per-step credit assignment

You can see if the test passed, not which tool call caused the failure or which fork's strategy was better. Attribution and replay are impossible.

→ M10 trajectory tracer: prompt → tool → diff → reward
PROBLEM
Training data leaves your VPC

Every existing coding sandbox is SaaS-only. Rollout trajectories collected for fine-tuning can't touch a third-party cloud for proprietary codebases.

→ Fully self-hosted: Helm chart, runs in your k8s cluster
verl_rollout_worker.py — VeRL integration sketch
# Drop into your VeRL rollout worker — replace subprocess/Docker with Arbor forks.
import httpx, asyncio

ARBOR = "http://arbor-api:8080"

async def rollout_batch(checkpoint_id: str, n: int, reward_cmd: str):
    # Fork N isolated environments from one snapshot.
    # Each fork gets fresh entropy, session tokens, and secret grants.
    forks = await asyncio.gather(*[
        client.post(f"{ARBOR}/v1/checkpoints/{checkpoint_id}/fork",
                   json={"branch_name": f"attempt-{i}",
                         "post_restore": {"quarantine": True, "identity_reseal": True}})
        for i in range(n)
    ])

    # Run agent policy in each fork; Arbor records per-step traces automatically.
    results = await asyncio.gather(*[
        run_agent(fork["workspace_id"], reward_cmd) for fork in forks
    ])

    # Export trajectories as JSONL → feed into PPO/GRPO advantage estimation.
    trajectories = await asyncio.gather(*[
        client.get(f"{ARBOR}/v1/workspaces/{r['workspace_id']}/trajectory")
        for r in results
    ])
    return trajectories   # statistically independent — no shared entropy
// how it compares
Feature Arbor E2B Docker Sandboxes Modal Daytona
VM-level isolation Firecracker Firecracker container mixed
Fork from checkpoint first-class API
Branch-safe restore unique
Credential brokering host-side proxy ~ partial
Default-deny egress ~ partial
Self-host / VPC-first first-class SaaS only SaaS only SaaS only
Multi-runner pool + Helm ~ partial ~ partial
ARM64 / Graviton2 fc-arm64-v1
GPU inference (host-mediated) fc-gpu-*-v1 ~ VFIO only
Sub-150ms boot
Open source MIT / Rust ~ SDK only
// roadmap
M1
Single-node create / exec / terminate
complete
M2
Guest rootfs + private Docker daemon
complete
M3
Full VM checkpoint + S3 upload
complete
M4
Branch-safe fork: quarantine + reseal
complete
M5
Secret Broker + Egress Proxy
complete
M6
Multi-runner pool + Prometheus + Helm
complete
M7
Diff snapshots (Firecracker GA)
blocked upstream
M8
ARM64 runner class (fc-arm64-v1)
complete
M9
GPU-capable workspaces via host-mediated inference
complete
M10
Trajectory tracer + rollout debugger — per-step traces, cross-fork diff, reward attribution, replay
complete
M11
arbor run-benchmark CLI — SWE-bench / custom benchmarks, multi-model comparison, trajectory export
complete
// get started

Up in five minutes.

STEP 01

clone & configure

git clone github.com/Billy1900/Arbor
cd Arbor
cp deploy/.env.example deploy/.env
STEP 02

start the stack

make docker-up

# API: localhost:8080
# Metrics: localhost:8080/metrics
# MinIO: localhost:9001
STEP 03

register a runner

make register-dev-runner

# x86_64 → fc-x86_64-v1 T2
# ARM64 → register-dev-runner-arm64
STEP 04

run the fork demo

make demo-fork

# Creates workspace,
# snapshots, forks 3 branches
PRODUCTION / KUBERNETES
helm install arbor deploy/helm/arbor --namespace arbor --create-namespace \
  --set api.config.databaseUrl="postgresql://..." \
  --set api.config.attachTokenSecret="$(openssl rand -hex 32)"
GitHub → Read the intro