Arbor is the checkpoint-native execution backend for agentic RL frameworks. Give your VeRL / OpenRLHF / TRL rollout workers statistically independent environments — branch-safe fork, per-step trajectory collection, VPC-first credential brokering — without leaving your cluster.
# Fork 8 isolated rollouts from one bug-repro checkpoint. # Collect trajectories, attribute reward, export winning patch. arbor run-benchmark swebench-lite \ --models claude-opus-4,claude-sonnet-4 \ --forks 8 \ --checkpoint repo@HEAD \ --reward "cargo test --test integration" # attempt-0 ✅ 47 tool calls · $0.23 · 4m12s ← patch exported # attempt-1 ❌ 61 tool calls · $0.31 · 6m08s ← replay available # attempt-2 ❌ 38 tool calls · $0.18 · 3m44s # ... ← trajectories → s3:// arbor replay attempt-1 --from-step 20 # rewind to failure point arbor diff attempt-0 attempt-3 # compare strategies cross-fork
Firecracker warns that restoring the same checkpoint twice gives both VMs identical PRNG seeds, token caches, and SSH state. Arbor solves this with a quarantine + reseal protocol enforced at the infrastructure level — no application coordination required.
The VM never receives your API keys. The host-side egress proxy injects credentials at the network boundary. Agents see OPENAI_API_KEY=arbor-brokered. Prompt injection, supply chain attacks, and environment leaks are structurally impossible.
Every checkpoint records its parent, forming a DAG of execution history across all rollout attempts. Fork N agents from the same start state, let them explore independently, diff strategies cross-fork, and merge the winner — the execution graph your reward model actually needs.
Per-fork traces: prompt, tool calls, shell commands, file diffs, test results, network access, token cost, wall time. Reward attribution traces which steps contributed to the final outcome. Cross-fork diff shows where strategies diverged. Export winning trajectories as JSONL for fine-tuning. Replay any failure from any step.
Each workspace lives in its own Linux network namespace. The only route out is through the host-side proxy. Egress is physically impossible to bypass — there's no route, not just a rule. nftables enforces the allowlist at the kernel level.
The entire control plane, runner pool, and egress proxy run inside your VPC. Code, secrets, and agent activity never leave your network. Docker Compose for dev, Helm chart for production Kubernetes. Runner agents run on bare-metal KVM hosts and self-register.
Built on Firecracker, the same microVM technology AWS uses for Lambda. Full VM isolation — not containers — with boot times competitive with container startups. Firecracker's Jailer provides an additional seccomp/cgroup isolation layer.
Scale across any number of bare-metal KVM hosts. Runner agents self-register, send heartbeats every 15 seconds, and drain gracefully on SIGTERM. The scheduler places workspaces on the least-loaded compatible runner. x86_64 and ARM64/Graviton2 pools are supported.
Native ARM64/Graviton2 runner class (fc-arm64-v1) with T2A CPU template enforcement. The control plane rejects misconfigured runners at registration with RUNNER_ARCH_MISMATCH. x86_64 and aarch64 workspaces are strictly isolated — snapshots can never cross architectures.
GPU-capable workspaces without VFIO passthrough. Workspaces call gpu.local; the egress proxy rewrites requests to a local sidecar (llama.cpp / vLLM / Ollama) and injects the model header host-side. GPU model weights and access tokens never enter the microVM. Runner classes fc-gpu-x86_64-v1 and fc-gpu-arm64-v1 report GPU capacity at registration; the scheduler enforces free GPU slot availability.
// Every fork goes through quarantine + reseal before READY. // Enforced at infrastructure level. Zero app coordination needed. fork(checkpoint_id) └─ new VM boots in QUARANTINED state ├─ all egress blocked ├─ all attach tokens invalidated └─ reseal hook chain runs: 1. bump identity_epoch → new VM identity 2. rotate session tokens 3. re-sign preview URLs 4. revoke + re-issue secret grants 5. re-seed guest entropy via vsock ───────────────────────────────── state → READY
Each fork gets a new identity epoch, preventing session tokens from crossing workspace boundaries.
All attach tokens from the parent snapshot are invalidated. New tokens are issued for the forked workspace only.
Secret grants are revoked and re-issued with fresh IDs. The egress proxy registry is updated atomically.
Guest PRNG is re-seeded via vsock, eliminating the shared-entropy correctness bug Firecracker warns about.
On credential brokering: when an agent calls api.openai.com, traffic flows: agent process → VM netns (blocked by default) → host TAP device → arbor-egress-proxy → allowlist check → credential injection → upstream. The VM never receives the credential value. Even if the agent logs its environment, leaks it to a supply-chain compromise, or is manipulated by prompt injection — the real key was never there.
Arbor is designed to be dropped in as the rollout environment backend for any agentic RL framework. Replace ad-hoc subprocess or Docker spawning with a backend that gives your training loop what it actually needs: statistically independent forks, per-step traces, and trajectory export — all in your VPC.
Naive snapshot reuse gives N forks identical PRNG seeds → correlated outputs → biased advantage estimates in GRPO / PPO group sampling.
You can see if the test passed, not which tool call caused the failure or which fork's strategy was better. Attribution and replay are impossible.
Every existing coding sandbox is SaaS-only. Rollout trajectories collected for fine-tuning can't touch a third-party cloud for proprietary codebases.
# Drop into your VeRL rollout worker — replace subprocess/Docker with Arbor forks. import httpx, asyncio ARBOR = "http://arbor-api:8080" async def rollout_batch(checkpoint_id: str, n: int, reward_cmd: str): # Fork N isolated environments from one snapshot. # Each fork gets fresh entropy, session tokens, and secret grants. forks = await asyncio.gather(*[ client.post(f"{ARBOR}/v1/checkpoints/{checkpoint_id}/fork", json={"branch_name": f"attempt-{i}", "post_restore": {"quarantine": True, "identity_reseal": True}}) for i in range(n) ]) # Run agent policy in each fork; Arbor records per-step traces automatically. results = await asyncio.gather(*[ run_agent(fork["workspace_id"], reward_cmd) for fork in forks ]) # Export trajectories as JSONL → feed into PPO/GRPO advantage estimation. trajectories = await asyncio.gather(*[ client.get(f"{ARBOR}/v1/workspaces/{r['workspace_id']}/trajectory") for r in results ]) return trajectories # statistically independent — no shared entropy
| Feature | Arbor | E2B | Docker Sandboxes | Modal | Daytona |
|---|---|---|---|---|---|
| VM-level isolation | ✓ Firecracker | ✓ Firecracker | — | — container | — mixed |
| Fork from checkpoint | ✓ first-class API | ✗ | ✗ | ✗ | ✗ |
| Branch-safe restore | ✓ unique | ✗ | ✗ | ✗ | ✗ |
| Credential brokering | ✓ host-side proxy | ✗ | ~ partial | ✗ | ✗ |
| Default-deny egress | ✓ | ~ partial | ✓ | ✗ | ✗ |
| Self-host / VPC-first | ✓ first-class | ✗ SaaS only | ✗ SaaS only | ✗ SaaS only | ✓ |
| Multi-runner pool + Helm | ✓ | ✗ | ✗ | ~ partial | ~ partial |
| ARM64 / Graviton2 | ✓ fc-arm64-v1 | ✗ | ✗ | ✓ | ✗ |
| GPU inference (host-mediated) | ✓ fc-gpu-*-v1 | ✗ | ~ VFIO only | ✗ | ✗ |
| Sub-150ms boot | ✓ | ✓ | ✗ | ✓ | ✗ |
| Open source | ✓ MIT / Rust | ~ SDK only | ✗ | ✗ | ✓ |
arbor run-benchmark CLI — SWE-bench / custom benchmarks, multi-model comparison, trajectory exportgit clone github.com/Billy1900/Arbor
cd Arbor
cp deploy/.env.example deploy/.env
make docker-up
# API: localhost:8080
# Metrics: localhost:8080/metrics
# MinIO: localhost:9001
make register-dev-runner
# x86_64 → fc-x86_64-v1 T2
# ARM64 → register-dev-runner-arm64
make demo-fork
# Creates workspace,
# snapshots, forks 3 branches