Sandboxed execution

Every agent action happens inside a disposable container, isolated per task and bounded at the wall. What runs inside, how profiles configure it, and how the lifecycle holds.

Every action an agent takes in Disco Parrot happens inside a sandbox: a disposable container, isolated per task, that pairs the agent with a small HTTP service the platform owns. The agent does its work against a checkout of your code in /workspace, the platform watches what happens through that service, and the container is the wall around everything the agent can reach. Sandboxes are where reviewable autonomy becomes a real boundary instead of a promise: the registered set of tools, the credential policy, and the container together decide what an agent can actually do.

What runs inside

A sandbox is a single container instance running an agent alongside a small platform-owned sidecar. Every file the agent reads or writes, every command it runs, every tool result it returns, and every stream the platform reads back goes through the sidecar's HTTP API. The host process that started the container never reaches into its filesystem or shell directly. The agent's working directory is /workspace, and shell commands are confined to paths under that root.

This is the part that matters for trust. The platform does not "trust the agent to do the right thing." It exposes a defined surface, the sidecar, and lets the agent act through that surface inside a container the platform can pause or destroy at will. The details on what the agent can and cannot reach across that wall live in approved actions.

Isolation: one key, one container

Each sandbox is tagged with an isolation key, and the rule is one key, one container at a time. The key encodes the kind of work and the thing it is for, so the platform can route a request to the right container or start a fresh one when the existing one has drifted out of shape. For the security-review version of this boundary, the keyed gateway, the sealed workspace, and what a run can and cannot reach, see how sandboxed execution isolates agents.

Three kinds matter day to day:

chat-{tenantId}-{conversationId}: one container per Chat or Ask session, so a conversation always lands on the same workspace.
flow-{instanceId}: one container per Flow run, so each run is isolated from the next.
background-{taskId}: one container per background task.

Two more keys exist for warm-pool prewarming and for "test a profile" runs, both described below. Tenant ids are baked into the chat keys so two tenants can never collide on the same conversation slot.

Sandbox profiles

A sandbox profile is the configuration a sandbox launches with: the image, the tools, the repos to check out, the AI runtime to use, the environment to attach, the credentials it may lease, and a few policy controls. Profiles are how you make a particular kind of work reproducible, so a "Node 22, Postgres, our two repos" sandbox is one click to launch, every time, the same way.

What a profile carries

A profile sets the following:

Identity. A name, a description, and whether it serves the platform surface or a project context.
Where it runs. The kind of host (Docker locally, Azure Container Apps, BYO Kubernetes), the specific host to point at, and any BYO container image you want to use as the base.
Image composition. The platform extensions and feature hooks layered into the image, so you do not have to build a custom image from scratch to get more capability.
Repositories. The repos the sandbox will check out, each with a branch, a mount path, a role (primary, context, or reference), and whether the checkout is read-only. Clone-time credentials are never stored on the profile; they are minted from your workspace's credential store at launch.
The AI runtime. Which provider and, when you want it pinned, which model power the agent in this sandbox.
Tools. The system packages and command-line tools to install, the MCP connections that should be available, and the skills the action launcher offers.
Managed commands. A catalog of named shell commands the agent may run, with three enforcement modes. Advisory is the default: the catalog is preferred but the native shell is unrestricted. Redirect-known is stricter: when the agent runs a native command that matches a catalog entry, the call is rejected and the agent is told to use the managed version instead. Strict-command is the tightest: only catalog commands run, and the agent's shell itself is closed.
The environment attached. Which is where the change policy lives.
Credential capabilities. The named capabilities the sandbox may request leases against, such as Git push on a specific repo.
Workload identity. When a profile needs to call a cloud resource without a long-lived secret, it can bind a workload identity (Azure user-assigned managed identity today) that sandbox processes use to authenticate as themselves.
Resource class. Planning, development, or development-large, with optional CPU and memory overrides.
Access control and prompts. Who can use the profile, and any required inputs a launcher should ask for.

Profiles also carry a per-profile Agent Instructions override on their own tab, so a profile pointed at a different stack can carry its own guidance without changing the workspace default.

What goes into the image

The image composition deserves a closer look, because it is the part that lets a workspace assemble a sandbox out of pieces instead of maintaining a custom Dockerfile per stack.

Features are named building blocks (Node 22, Postgres, the CLI for your cloud) drawn from the same devcontainer Features catalog the rest of the industry uses. You declare what you want and the platform layers it into the image at build time.
Build hooks run as root while the image is being built. Use them for the heavy work that should happen once per image, not once per launch.
Runtime hooks run as the sandbox user at launch, with phases for create, post-create, and post-start. They handle the per-launch setup, like loading environment-specific config or seeding a database.
Prewarm hooks run during warm-pool preparation, so the heavy work is done before any agent ever attaches. Each prewarm hook declares the credentials it uses and a cache key, so warm containers are reusable across runs that share the same access profile.

The result is a sandbox image you describe rather than one you hand-roll, and any team member who can read the profile knows exactly what is inside.

You manage profiles at Platform → Sandbox Profiles, and the field-by-field how-to of the editor is sandbox profiles. Each profile's detail page carries a Recent activity panel that surfaces the audit events scoped to it, alongside any recent test runs and the records currently using it. So when you want to know what changed on a profile lately, you do not have to leave it.

add_photo_alternate

Screenshot to capture

The Sandbox Profile detail page: the tabbed editor (Overview, General, Runtime, Image, Resources, Commands, Agent Instructions), with the runtime + repos + tools visible on the General/Runtime tabs.

save as: public/docs-media/sandbox-profile-detail.png

Caption when added: A sandbox profile: image, runtime, repos, tools, environment, capabilities. The configuration a sandbox launches with.

Lifecycle

A sandbox moves through a small set of states: creating, running, paused, destroyed, and failed. Most of the time you do not notice the lifecycle at all; the platform handles it.

A container is created, runs while there is work to do, pauses when idle, and is destroyed when its work is done. A configuration change triggers a fresh container instead of leaving a stale one.

Idle auto-pause. A sandbox that nothing is touching is paused after the configured idle timeout. A paused container holds its state but stops costing compute.
Resume on use. The next time the same isolation key is requested, the platform reconnects to the paused container if it is still around, or it restarts the work in a fresh one if it is not.
Drift triggers a fresh container. If the profile, image, host, or runtime fingerprint has changed since the container was last started, the platform destroys it and brings up a new one. You do not get a stale container after a config edit.
Disposable by default. A sandbox is treated as throwaway: code state belongs in your repo, change records belong in the work model, and the container can be destroyed at any time without losing anything that matters. A profile can opt a sandbox into persistent workspace storage when persistence between runs is genuinely useful, but only on the Local Docker host kind today (Azure Container Apps and BYO Kubernetes sandboxes are always disposable). The volume is held for a short retention window and pruned once the profile no longer needs it.

You can see the active sandboxes for your workspace at Platform → Sandboxes, with their status, the work they belong to, and last activity; managing them, pausing, destroying, and clearing out the inactive ones, is covered in sandboxes.

add_photo_alternate

Screenshot to capture

The Sandboxes list page: rows for every active sandbox with status badge (running / paused / creating / destroyed / failed), isolation key, profile, last activity, and quick actions.

save as: public/docs-media/sandbox-list.png

Caption when added: The active sandboxes in a workspace, with status and the isolation key each one belongs to.

Warm-pool prewarming

A sandbox does not start free. Pulling an image, building a checkout, and installing tools all take time, and a brand-new container is the slowest version of the same work the platform will do a thousand times. The warm pool keeps a small inventory of containers built in advance for a profile, so when a person opens a chat or a Flow run starts, the platform attaches to a ready container instead of building one from scratch.

Cold launch

1.Pull the image
2.Create the container
3.Clone the repos
4.Install the tools

Ready in tens of seconds

Warm pool

1.Attach to a pre-built container

Ready almost immediately

A warm pool keeps containers built and ready to attach, so a new conversation, Flow run, or background task starts in seconds instead of waiting for a cold launch.

You turn the warm pool on per profile, set a minimum ready (the floor the pool tries to maintain) and a maximum ready (the ceiling it will hold), and the platform handles the rest. Three strategies cover the range of trade-offs: a pre-built sandbox container for the fastest cold start, a prepared volume that reuses heavy artifacts across runs, and the host's native pool mechanism when the host platform offers one. The platform picks the best strategy the host supports, or you can pin one if you want a particular balance between cost and speed.

add_photo_alternate

Screenshot to capture

The Warm Pool section of the Sandbox Profile detail page: Enabled toggle, Minimum Ready and Maximum Ready inputs, the Strategy Policy selector (auto / fastest / cost-optimized), and the live state of the pool (how many containers are ready right now).

save as: public/docs-media/sandbox-warm-pool.png

Caption when added: Warm-pool configuration on a sandbox profile, with the floor, ceiling, and strategy you set.

Where sandboxes run

A profile points at a host, which is the actual compute that runs the container. Disco Parrot supports several host kinds:

Local Docker. Runs the sandbox on a Docker engine on the operator's machine. Suited to development and demos.
Azure Container Apps. The platform-managed option. The sandbox runs on Microsoft's serverless container service, so your team does not run any infrastructure.
BYO Kubernetes. The platform installs a small operator into your own cluster, and your cluster runs the sandboxes. The platform itself never touches the Kubernetes API directly; the operator does the work inside your boundary.

You register hosts under Platform → Sandbox Hosts and pick the right one per profile, or you let a profile use the workspace's user-default host. Choosing a host is how you put a particular workload on the compute it should run on, the same way you point a profile at a particular image. The deployment choices in full, and the how-tos for standing up your own host, are in sandbox hosts and deployment options.

Test a profile

Before you wire a profile into real work, you can test it. The platform launches a one-off sandbox under a profile-test: isolation key, runs a quick smoke check inside it, probes the runtime tools you declared, and verifies that the AI runtime is reachable. If a tool is missing or a key is wrong, the test fails with a clear message before any agent goes near it. Each test run is recorded, so the trail of "did this profile boot last Tuesday" is there when you need it. The full test surface, the probes a run executes and how to read a passed or failed result, is test a sandbox profile.

add_photo_alternate

Screenshot to capture

The Test Profile result drawer: the overall pass / fail outcome, the per-check rows (sidecar smoke check, runtime tool probes, Anthropic auth probe, capability probes), and the message for any failure.

save as: public/docs-media/profile-test-result.png

Caption when added: A profile test boots a one-off sandbox and reports what works and what does not.

Open in your IDE

A running sandbox is a real workspace, and sometimes you want to look at it from your editor. The platform issues a time-limited IDE session for a sandbox in one of two modes: inspect, where you can read and explore but not write, and write, where edits land back in the sandbox. The session connects either through SSH into the container (so VS Code Remote-SSH and JetBrains Gateway work) or through a browser-based editor served over a tunneled connection from the sidecar. Per-profile policy decides whether the browser write mode is allowed at all. The full how-to, the three ways in and how a session is bounded, is open a sandbox in your IDE.

Every session carries a short time-to-live on the credentials it issues, and a session can be revoked at any time. Once revoked, the next attempt to attach returns an error rather than silently reconnecting, and the keys do not outlive the session that issued them. An editor that walked off with a tunnel cannot keep using it after the session is gone.

Network egress

Outbound network from a sandbox is governed by the host it is running on, not by the sandbox itself. A Local Docker host inherits whatever the Docker daemon allows. An Azure Container Apps environment uses the network rules of the ACA setup. A BYO Kubernetes host follows your cluster's network policy. If you need a tighter egress story than the default, the place to set it is on the host, where the rule applies across every sandbox the host launches rather than being repeated on every profile. How a bring-your-own host connects to the platform, the outbound channel, the short-lived tokens, and the exact egress to allow, is network and connectivity.

Why it matters

For your team, sandboxes mean every run starts from a known state, never carries leftover changes from the last one, and never has to fight a stale environment. The pause-and-resume model means you can leave a conversation overnight and pick it up where you left off without paying for idle compute.

For security and compliance, the sandbox is the operating wall around an AI agent. The agent inside can write code freely against the checkout, but the credentials it might lease are short-lived (approved actions covers the policy), the change policies an environment attaches keep the riskiest actions reviewable, and the container itself is disposable. You can stop an agent by destroying its container, and you can prove what the agent did by reading the audit trail and the per-run transcript.

verified_user

Approved actions

The credential and change policies a sandbox enforces at its edge.

memory

AI models

The runtime a profile points at.

receipt_long

Audit trails

Everything a sandbox does, recorded.