Run a Flow & manage checkpoints
Start a Flow run, choose how closely you watch it, read the step timeline and transcript, and approve, reject, or skip at each checkpoint. Plus retry, resume, cancel, and what the blocked and interrupted states mean.
A Flow is a template; a run is one execution of it. Running a Flow is where reviewable autonomy stops being a principle and becomes a thing you watch happen: an ordered set of steps, each producing a result you can read, with gates where you decide before the work continues. This page is about the run: how to start one, how to choose how closely you watch it, how to read what the agent did, how to approve or reject or skip at a checkpoint, and what to do when a run is blocked, fails, or gets interrupted by a deploy.
Writing the template, the steps and skills and conditions, is the sibling surface, Build a Flow.
Four ways to start a run
A Flow run begins from wherever the work lives. There are four launch paths, and they all converge on the same run.
- From the Flow's editor. The Run Flow action on a Flow detail page opens the run dialog. This is how you test a Flow you are building.
- From a record. The action launcher on a plan or initiative offers an Automated mode that runs a Flow against that record with its context already in scope.
- From a chat. Ask the agent in Chat to start a Flow and it calls
start_flow; the run proceeds while the conversation continues. - From a trigger. A Flow that carries a schedule or a pull-request trigger starts itself when the schedule fires or the event arrives.
However it starts, the run shows up under Sessions alongside your chats and background tasks, and the run detail page is where you watch it. A Flow that runs on a schedule can be paused and resumed from the schedule itself, so you can switch off its automatic firing for a while without losing the cadence.
A run can decline to start before any step executes, and the dialog tells you why. A required parameter left empty blocks the Start button until you fill it. A run started in the first few seconds after a platform update is asked to retry while the orchestrator finishes coming up. A pre-flight check that fails (a plan or repository the Flow needs is missing) surfaces as a start failure rather than a step failure, so a run that never really began is distinct from one that ran and broke. And a workspace that has used its daily run allowance is told the cap is reached. Each is a clear message at the point you click Start, not a silent stall.
The three run choices
When you start a run from the dialog, you fill in the Flow's parameters and make three choices on top of them. Each one defaults to the Flow's own setting, and each one you can change for this run without touching the template.
A sandbox profile
Every run executes in an isolated sandbox, and the profile is what that sandbox is: its base image and tools, the repositories it can see, and the AI runtime it uses. The profile is required, because a run needs an environment to execute in. Choosing the profile is how you point the same Flow at different stacks; a parameter named repositoryId even auto-selects a matching profile for you.
Interactive or background
The execution mode decides how much you watch and what happens to the environment while a checkpoint waits.
Interactive is a run you watch and approve as it goes; the run pauses for you at its checkpoints and holds its sandbox while it waits. Background is a run that proceeds on its own: it creates a background task, aggregates the turn count and cost across its steps onto that task, surfaces in Sessions and the Command Center, and pauses for you only where it must. A background run that hits a checkpoint flips its task into a review state and releases its sandbox while it waits, rather than holding the environment idle until you come back.
A checkpoint mode
The checkpoint mode decides which steps pause for your approval:
- Template defaults honors the Flow's own per-step checkpoints, pausing only at the steps the Flow marked.
- Pause before every step pauses before every step, marked or not, for a run you want to watch closely.
- No checkpoints runs straight through with no pauses, for a Flow you trust to run unattended.
Setting the mode to pause before every step is the natural way to walk a new or risky Flow one stage at a time; setting it to none is how a well-worn Flow runs while you do something else.
Watching a run
The run detail page is the single place where everything about a run is visible. It opens with the run's status badge and the actions available for that status in the header, a summary card, and three tabs: Timeline (the step-by-step view), Parameters (the inputs the run started with), and Transcript (the full record of what the agent did).
The summary card
The summary card carries the run at a glance: who started it, when, how long it has been going, the run's own identifier, the initiative, plan, and sandbox it is linked to (each a click away), the retry chain if the run was retried, and a preview of the parameters it started with. It is the orientation header for everything below it.
The Timeline tab
The Timeline tab is one card per step, in order. Each card shows a status icon (a spinner while running, a green check when complete, a red mark on failure, a pause glyph when blocked or paused, a forward arrow when skipped), the step number and name, a status badge, the elapsed time and the number of agent turns the step took, and a /skill-slug badge naming the skill the step ran. A step that produced something offers artifact buttons inline: View Initiative, View Plan, View PR, and Inspect Output, which opens a drawer with the step's summary, its artifacts as file paths, its raw structured output, and any tool calls that were auto-resolved (the place a background run's auto-approvals are written down). When a step auto-resolved a tool call, the card notes it.
When the run is paused at a checkpoint, the inline checkpoint card appears on the timeline at the step that is waiting. That is covered in Checkpoints below.
The Transcript tab
The Transcript tab is the full record of what the agent did, and it is the right place to read the work rather than the result. It opens with a setup card (the provisioning the run did before the first step) and then a transcript per step. The entries the transcript renders are:
| Entry | What it shows |
|---|---|
text | The agent's prose, rendered as Markdown. |
thinking | The agent's reasoning, in a collapsible block. |
tool_start / tool_end | A tool call and its result, merged into one card. |
tool_delta | Streaming output from a running tool (shell output, verbatim). |
error | A tool or step error, in a red box. |
setup / info | Provisioning and status notes. |
A tool_start entry carries the structured detail for the common tools: a file edit shows its diff, a write shows the new file, a web fetch shows the URL and a body preview, and a todo write shows the agent's working checklist. So the things the agent did to your workspace, the edits and writes, are not buried in text; they are cards you can read.
The timeline answers "what happened"; the transcript answers "how." When a step's result surprises you, the transcript is where you find the agent's reasoning, the exact tool calls, and the streamed shell output that produced it. It is the difference between a status and an explanation.
Read one step end to end and the shape comes through. Open the Implement step on Sarah's CSV run. The setup card at the top shows it cloned orcette/insights. Below it, a thinking block holds the agent's reasoning about where the download button belongs; a text entry states what it is about to change; an Edit card shows the diff it wrote to server/reports/export.ts, lines removed in red and added in green; and a tool_delta entry streams the npm test output that ran after. You are not reading a claim that the work happened. You are reading the work.
Each step records the number of agent turns it took and the cost in dollars. The run timeline shows the turn count per step. The dollar cost is rolled up onto the run's Sessions record and, for a background run, onto its task card, where the run's total turn count and total cost are shown. Read the timeline for what happened; read the session for what it cost.
Checkpoints: approve, reject, skip
A checkpoint is the moment a run pauses and hands you the decision. When a step the checkpoint mode marks comes up, the run pauses with status paused and shows the checkpoint inline on the timeline: the step's message, the previous step's summary so you have context, an optional comment box, and three buttons.
- Approve continues the run from that step. Anything you type in the comment box is recorded with the decision.
- Reject stops the run; its status becomes rejected, and your comment (or a default) is recorded as the reason.
- Skip marks the paused step skipped and advances to the next one. If the skipped step was the last, the run completes.
A checkpoint is not a rubber stamp, and the card is built to let you decide on evidence rather than reflex. The previous step's summary sits right there, so you approve when the result is what you expected, reject when the run is on the wrong track and should stop, and skip when this particular step does not apply this time but the run should go on. Your comment travels with the decision, so the reason is on the record for the next person or the next run.
A run started from a chat shows the same decision as a dialog in the chat panel's Flow progress overlay rather than on a timeline, with the buttons labeled Approve & Continue, Reject, and Skip Step, and the completed step's artifact links right in the dialog; the decision is the same.
A background run still pauses at a checkpoint and waits for you, with one exception: a step its author marked to auto-approve in the background continues without waiting, and that auto-approval is written onto the step as an auto-resolved decision rather than passing silently, so the record shows exactly what advanced without you. Everything else parks the run in a review state (and releases its sandbox) until you come back and decide. The full semantics, including how the background auto-approve flag is set, are in Human checkpoints.
How a run moves through its states
A run moves through a clear set of states, and the status badge always tells you which one it is in. The states are not just a list; they connect, and the only way out of a stopped or stuck state is a person deciding to resume or retry.
| State | Meaning |
|---|---|
| pending | Queued, not yet started. |
| running | A step is executing. |
| paused | Waiting at a checkpoint for your decision. |
| completed | Every step finished. |
| failed | A step failed and the run stopped. |
| rejected | A person rejected the run at a checkpoint. |
| cancelled | Someone cancelled the run. |
| blocked | Waiting on something outside the agent: a credential, an environment issue, an unavailable integration. |
| interrupted by deploy | The platform was updated mid-run. The badge reads "Interrupted"; retry continues the run. |
Steps carry their own statuses within the run (pending, running, completed, failed, blocked, skipped, cancelled), which is what the timeline icons reflect. Each step's result is validated against a fixed shape for its skill: a step that finishes without producing its structured output is recorded as partial (it ran, but it did not hand back what the next step needs), and a result that is produced but does not match the shape is sent back for another pass before it would fail. Every result reports an outcome of success, partial, blocked, or error, and a step that reports itself blocked (it needs something it does not have) pauses the run for attention rather than pressing on.
Retry, resume, and cancel
A run is not a one-shot. The header offers the action that fits the run's current state, so you do not have to remember which verb applies where:
-
A blocked or cancelled run shows Resume.
-
A failed run shows Retry.
-
A run interrupted by deploy shows Retry Flow.
-
A paused run has no header button; you decide right there on the inline checkpoint card.
-
Resume picks the run back up from the first step that has not completed or been skipped. It is the action for a run that stopped cleanly: one that was blocked and is now unblocked, or one you cancelled and want to continue.
-
Retry re-runs from the first step that has not completed, resetting that step and the ones after it to a clean state before it runs them again. A retry may continue the same run or clone it into a fresh instance; when it clones, the page follows you to the new run and the summary card records the retry chain so the lineage stays readable.
-
Cancel stops a running or paused run, marks it cancelled, and releases its sandbox. Cancel appears only while a run is running or paused.
Resume or retry
Both pick up from the first step that is not already done, so neither repeats a completed step. The difference is intent, and the run's state already points you at the right one. Reach for resume when the run stopped cleanly and you are continuing it: you cleared a blocker, or you want a cancelled run to carry on. Reach for retry when the run ended badly and you are re-running it: a step failed, or a deploy interrupted it. Resume continues; retry resets the unfinished step first, then continues. In both cases the prior transcript stays on the run, so the history of what already happened is never lost; only the step state ahead of the pickup point is cleared.
When a run is blocked
A blocked run is parked, not lost: it preserves where it was so a resume continues from the blocked step rather than starting over. It is waiting on something only a person can supply, and the banner on the run page tells you which.
- A credential is needed. A step could not clone a repository because the integration is not connected. The banner offers to connect it; once connected, you resume the run.
- An environment has an open issue. A step that needs a healthy environment hit an open health event on it. The banner links each health event so you can resolve it, then resume.
- An integration is unavailable. A tool the step needs (an MCP server, for example) did not come up. Resolve the integration, then resume.
A step can also report itself blocked when it cooperatively decides it does not have what it needs to proceed safely, which parks the run for a look rather than guessing. In every case the banner names the cause and points at the fix.
When a deploy interrupts a run
When the platform is updated while a run is in flight, the run is transitioned to interrupted by deploy rather than failed, and its sandbox is preserved. Steps that were streaming reconnect to the new platform revision and keep going; steps that could not continue are parked. The run page shows an "Interrupted" banner explaining that a retry continues from the first step that did not complete, and links the new instance if the retry cloned one. A deploy is a platform event, not a failure of your run, and the platform treats it that way.
For a run that stopped and the specific fix for each reason, see troubleshooting flows and runs.
A worked example
Sarah has the CSV download button plan ready to implement. She opens its action launcher, picks the Automated mode, and chooses the Full SDLC Flow. The run dialog opens; the plan is already in scope, she picks the Analytics-Node20 sandbox profile, leaves the execution mode on Interactive and the checkpoint mode on Template defaults, and clicks Start Flow.
The run page opens. The Plan step runs and completes in a few turns; the Implement step runs longer, writing the download-button code, and finishes with a View Plan artifact button. The Verify step runs the team's verification skill against the five customer fixtures and reports the work ready. The Create PR step is marked for a checkpoint, so the run pauses: the inline checkpoint card shows the verify step's summary and a message, "Sign off before the PR opens." Sarah reads the summary, types "Looks good, ship it" in the comment box, and clicks Approve.
The run continues. The Create PR step opens a real pull request on orcette/insights, authored as Sarah, and the run completes. The summary card shows the duration and the linked pull request; the transcript holds every edit the implement step made; the Sessions record holds the run's cost. One Flow, watched from start to finish, with a person's sign-off in the one place it mattered.
The same run, in the background
The next afternoon Sarah starts the same Full SDLC Flow on another plan, but this time she picks Background. The run does not open a page she watches; it becomes a task in the Command Center with a running badge, and she moves on to other work. When the run reaches the Create PR checkpoint, the task flips to a review state and releases its sandbox while it waits. A notification brings her back; she opens the task, reads the verify summary and the diff exactly as she would on the timeline, and approves. The run picks the sandbox back up and finishes, and the task card shows the total turns and cost.
The two runs are the same Flow with the same gate in the same place. The only thing that changed is how much of it Sarah watched, which is the whole point of the choice.
Flow steps can fan out
A step is not limited to one agent doing one thing. A step can spawn sub-agents, up to twenty in a batch, to work parts of a problem in parallel and report back, exactly as a chat agent can. The same cap and the same rule apply: twenty sub-agents per batch, one level deep, with recursive fan-out (a sub-agent spawning its own sub-agents) rejected. The sub-agents run co-located on the step's sandbox, and their work rolls up into the step's result.
How many runs a day
How many Flow runs your workspace can start in a day depends on your plan. The counter resets each UTC day.
| Plan | Flow runs per day |
|---|---|
| Free | 5 |
| Team | 20 |
| Business | Uncapped |
This is the cap behind a "you have reached your daily flow runs" message; it counts runs started, across the whole workspace, per UTC calendar day. The separate cap on how many Flow templates a workspace can keep is on the Build a Flow page.
Permissions
| You want to | Scope |
|---|---|
| Watch a run and read its transcript | flows.read |
| Start, resume, retry, or decide a checkpoint | flows.run |
| Cancel anyone's run | flows.cancel.any |
flows.run is the everyday scope for the people who run the work, and it carries cancelling a run you started yourself; flows.cancel.any is the elevated scope for cancelling a run regardless of who started it. The actions on the run page appear for the scopes you hold.
Why running a Flow looks like this
A long-running agent doing real work raises one hard question: how do you stay in control without standing over it the whole time? The two easy answers both fail. Watch every token and you have not automated anything; let it run unattended with no gates and you have an agent loose in your system with policy as the only seatbelt.
The run surface is the answer in between. The checkpoint mode is a dial you set per run, from "pause before every step" when you are walking a new Flow through, to "no checkpoints" when you trust it, to the template default in between. The states are honest: a run that is blocked says what it is waiting for and lets you resume once you have supplied it; a run interrupted by a deploy is told apart from a run that failed, because they are different things and call for different responses. The transcript is the receipt: every edit, every write, every web fetch the agent made is a card you can read, so "the agent did this" is never something you have to take on faith. And retry, resume, and cancel mean a run is a thing you steer, not a thing you launch and pray over.
For a senior PM, the run surface is the answer to "how much do I have to babysit this?" The honest answer is "as much or as little as the work deserves, and you choose per run." For an engineer, the transcript is where "the agent opened this PR" becomes "here is exactly what it changed and why." For a lead, the checkpoint in front of the pull-request step is where a team's automation stays accountable to a person. For a new hire, a completed run read top to bottom, the choices, the steps, the checkpoint, the transcript, is the whole operating model in one screen. A Flow run is autonomous where autonomy is safe and reviewable everywhere it matters.
The sibling surface. Lay out the steps, bind the skills, and set the checkpoints you approve here.
The concept: what a Flow is and how runs fit reviewable autonomy.
Approve, reject, or skip, and exactly how background runs handle a checkpoint.
Where runs appear alongside chats and tasks, with the cost and the resume controls.