PR review loop

What happens to a pull request after an agent opens it. Your reviewers comment on GitHub, the agent reviews its own diff, and both streams become findings you resolve with a Fix this that writes new commits and rechecks itself until the work converges or asks for a person.

A pull request is where agent work meets your team. The agent opens it under your identity, your engineers read the diff and comment on GitHub the way they always have, and Disco Parrot automates the half that usually drags: the part where a comment becomes a change becomes another comment. It pulls your reviewers' comments in, lets the agent review its own diff alongside them, and turns each one into a finding you can hand back to an agent with one click. The agent answers with code, rechecks its own work, and keeps going until the finding is resolved or it has reached the point where a person should look.

This page is the loop itself: where the findings come from, what their severities and statuses mean, how Fix this dispatches an agent, and how the recheck decides when the work has converged. For how a pull request gets opened in the first place, and why GitHub stays the place you merge, read ship code.

Where the review lives

Every implementation plan that has opened a pull request gets a review surface of its own. From the plan's PR badge you open the in-app review, a full page built for reading a change and acting on it, laid out as three panels you can resize and collapse to taste:

Plan context on the left, so the change is read against the work it was meant to do.
The diff in the center: the pull request's files in a tree you can filter, shown split or unified, with a Viewed checkbox per file so a long review remembers where you stopped. Click a finding on the right and the diff jumps to the line it is about, which is how the two panels work as one.
The conversation on the right, the stream of findings and comments the rest of this page is about. Its header keeps a live count of what is open and what is resolved, groups the stream by file, by severity, or by time, and hides resolved findings by default so the open work is what you see.

The in-app review is a page, not a popup: plan context, the diff, and the conversation of findings, side by side and resizable.

The surface is a page, not a popup, because reviewing a change is real work that deserves the room.

info

The review appears once a PR is linked

A plan gets its review surface the moment a pull request is opened against it. Until then the page shows a No PR yet prompt that points you to open one from the plan's Flows launcher or with /create-pr in chat.

The page stays in step with GitHub on its own. Once the pull request is merged or closed there, a banner marks it and the conversation goes read-only, so the in-app picture always matches the real one. If the pull request is deleted on GitHub, the page says so and offers a re-sync rather than guessing. GitHub stays the source of truth.

add_photo_alternate

Screenshot to capture

The full-page in-app PR review at /planning/plans/:id/review, dark theme. A breadcrumb 'Plans / CSV download button / Review' and a title 'PR Review - CSV download button' with a 'PR #142' badge carrying an external-link glyph to GitHub. Three resizable panels: left 'Plan Context' (the plan's title, acceptance criteria, linked test cases), center a diff viewer (file tree on the left edge, a side-by-side diff with green additions and coral deletions), right a 'Conversation' panel showing a stack of finding cards. Thin draggable splitters between panels.

save as: public/docs-media/pr-review-surface.png

Caption when added: The in-app review: plan context, the diff, and the conversation of findings, side by side on one resizable page.

Where findings come from

A finding is one thing to address on the pull request. Two streams feed the conversation, and they sit side by side because to the work it does not matter which one raised the point.

Your reviewers' GitHub comments are read in through a webhook the moment they land. When a reviewer submits a review or leaves a comment on a line, the platform mirrors it into a finding attached to the plan, so the request shows up in the app next to the diff it is about without anyone copying anything across. The flow is one direction only, from GitHub into Disco Parrot. The platform never writes back to the pull request, so your reviewers' threads, approvals, and merges stay entirely theirs.

The agent's own review is the second stream. An agent can review the diff it produced and raise findings against it, the same shape of finding a human reviewer would leave, so the obvious problems are caught and queued before a person spends attention on them. The agent's review is held to the same bar as a person's, not a softer one. It raises findings against the diff it just wrote, blocking ones included, and it cannot close them by saying so. A finding the agent raises is one the agent then has to answer with code, the same as a finding a reviewer left.

What a review becomes depends on what the reviewer did. The mapping carries the reviewer's own weight into the app instead of flattening every comment into one urgency.

What a GitHub review becomes depends on what the reviewer did. An approval raises nothing; there is no work in a thumbs-up. A review that requests changes becomes a blocking finding, because the reviewer has said as much. A review left as a comment, and any inline comment on a line, becomes a question, the softest severity, because a comment is a prompt to look rather than a demand to change. The mapping is deliberate: it carries the reviewer's own weight into the app rather than flattening every comment into the same urgency.

add_photo_alternate

Screenshot to capture

The Conversation panel showing two stacked finding cards to contrast their authorship, dark theme. A header strip reads '5 open . 2 resolved' with a 'Group by: File' dropdown and a 'Hide resolved' switch (on). Top card: a GitHub avatar plus name 'Tom', a red 'Blocking' pill, body 'Respect the existing column filter before exporting', a monospace file tag 'src/export/csv.ts:54'. Bottom card: a sparkle glyph plus 'AI Review' chip, an amber 'Suggestion' pill, body 'Extract the header-building into a helper; it is duplicated', file tag 'src/export/csv.ts:71'. Both carry 'Fix this' buttons.

save as: public/docs-media/pr-review-finding-streams.png

Caption when added: Two streams, one conversation: a reviewer's blocking comment and the agent's own suggestion sit together, each with Fix this.

Reading a finding

Every finding is a card in the conversation, and the card is built to answer the reviewer's first three questions at a glance: who raised it, how much it matters, and where it stands.

Who is the author chip. A finding from your reviewer carries their GitHub name and avatar; a finding from the agent's own review carries an AI Review mark, so you always know whether you are reading a teammate or the agent.

How much it matters is the severity, shown as a colored pill:

Severity	What it means
Blocking	Has to be addressed before the change should land. A changes requested review becomes this.
Suggestion	Worth doing, not a gate. An improvement the reviewer would like to see.
Nit	A small, optional polish. Take it or leave it.
Question	A prompt to look or explain. Every inline comment and every commented review becomes this.

Blocking and Question are the severities that arrive from GitHub. Suggestion and Nit come from the agent's own review, or from a reviewer when someone sets a finding's severity by hand, so the softer end of the scale is where the agent's self-review and your team's judgment live.

Where it stands is the status, and the status is where the loop shows itself. A finding moves through a small, legible set of states:

Status	What it means
Open	Raised, not yet worked. The starting state for every finding.
Fix in progress	An agent is on it. The card reads Running · Attempt N/3 with a spinner while the loop runs.
Resolved	The fix was made and the recheck confirmed it. The card collapses and strikes through.
Regressed	A fix made something else break. The loop pauses and the card explains what regressed.
Needs human	The agent tried and did not converge. The card asks you to step in.
Won't fix	A person decided to leave it. A deliberate close, on the record.

Alongside those, a card carries the file and line it is about, the comment body, any acceptance criterion or test case it is linked to, and a suggested fix when the reviewer offered one. It is a complete picture of one point of review, in one place, whether the point came from a person or the agent.

A finding moves through a small, legible set of states. The recheck drives the automatic ones; Regressed and Needs human wait for a person, who picks Retry, Mark resolved, or Won't fix.

add_photo_alternate

Screenshot to capture

A single finding card in the Conversation panel, dark theme. Top row: a sparkle glyph + 'AI Review' author chip, a red 'Blocking' severity pill with a warning glyph, and a monospace file:line citation 'src/export/csv.ts:54'. A status badge on the right reads 'Open' (neutral). The card body: 'The new button should respect the existing column filter before exporting.' Below the body a linked badge 'AC-3' and a primary 'Fix this' button with a wand glyph. Dark theme.

save as: public/docs-media/pr-review-finding-card.png

Caption when added: One finding: who raised it, its severity, the line it is about, where it stands, and Fix this to hand it to an agent.

What a good finding looks like

A finding is only as useful as the change it points at. The ones that move the work share a shape, and it is the same shape whether a reviewer left it or the agent's own review raised it: a specific place, a specific problem, and enough of the why that an agent can act without guessing.

A good finding names the line, not the file. "The export should respect the column filter" attached to src/export/csv.ts:54 tells an agent exactly where to read and what to change. The same point left as "exports look wrong" attached to nothing sends the agent hunting, and a hunt is where a fix goes sideways. The file-and-line citation on every card is not decoration; it is the difference between a finding an agent can answer in one pass and one it has to interpret first.

A good finding carries a reason, not just a verdict. "Extract this into a helper, it is duplicated three times" tells an agent what done looks like. "Clean this up" does not, and an agent that does not know what done looks like is the agent that takes all three attempts and still lands on Needs human. When a reviewer offers a suggested fix, the card keeps it, and the agent reads it as the target rather than reverse-engineering one.

This is also why the agent's self-review is held to the human bar rather than a softer one. A review that raised vague findings against its own diff would be noise the team learned to scroll past. By raising the same specific, line-anchored, reasoned findings a careful reviewer would, the agent's review earns the same attention, and the obvious problems are caught and queued before a person spends a minute on them. The bar is what makes a second stream worth having instead of a second thing to ignore.

Fix this: a comment becomes a commit

A finding with a Fix this button is one you can hand straight to an agent. It appears on the findings worth automating, the blocking ones and the suggestions, the points that ask for a code change rather than a conversation.

Click it and the loop starts. An agent picks the finding up in a fresh sandbox, makes the change it asks for, and pushes it as new commits on the same pull request. The card flips to Fix in progress the instant you click, so you can watch the work rather than wonder if it took. If the dispatch cannot start, the card rolls back to Open and offers a Retry, so a click never quietly does nothing.

The fix arrives the way the original change did, as commits your team reviews and merges on GitHub. The agent answers review by doing more work, never by editing the conversation about it: it does not reply to the thread, resolve the comment, or approve its own change. The request stays human and stays on GitHub; the only thing the agent adds is code.

add_photo_alternate

Screenshot to capture

The same finding card mid-fix, dark theme. The status badge now reads 'Running . Attempt 1/3' with a small spinner, in a brand-cyan tint. The 'Fix this' button is replaced by a muted line 'Agent addressing this in a sandbox' and a faint branch tag 'pr/maya/1706-2042'. The severity pill still reads 'Blocking'. A subtle 'Show attempts (1)' accordion control sits at the card's bottom edge.

save as: public/docs-media/pr-review-fix-running.png

Caption when added: The moment you click Fix this: the finding goes to Fix in progress and an agent takes it up in a sandbox.

The recheck loop: how the work converges

This is the part that makes the loop a loop. An agent that makes a change is not done; an agent that makes a change and then checks whether it actually worked is. After the agent applies a fix, it rechecks its own work, and what the recheck finds decides what happens next.

You do not run the recheck; it runs itself, after every fix, and the finding's status is how you watch it. The card updates live as the loop turns, with no refresh, so the attempt count climbing, a regression, or a hand-off back to you appears the moment it happens.

After every fix the agent rechecks its own work. What the recheck finds decides the next move, and at three attempts it stops and asks for a person.

There are four ways a recheck can land, and each one moves the finding somewhere you can read:

Resolved. The fix worked and nothing else broke. The finding goes to Resolved, the card collapses, and that point of review is closed. When the last finding resolves, the loop is done.
Still open. The fix helped but did not finish the job. The agent queues another attempt, a fresh implement-and-recheck pass, and goes again. The card counts up, Attempt 2/3, so you can see it working rather than guessing.
Regressed. The fix made something else break, caught because the recheck runs your test cases, not just the one the finding named. This is the difference between a fix and a regression you find next week: an agent that re-checked only the line it was asked about would close findings while quietly breaking the file around them. The loop pauses rather than digging the hole deeper, and the card turns up a banner naming what regressed. From there the move is yours: Retry to send the agent back in with the regression named, Won't fix to close the finding on the record, or fix the line yourself and Mark resolved. The loop hands you the controls; it does not pick for you.
Needs human. The agent took its attempts, three by default, and did not converge. Rather than loop forever, it stops and hands the finding back marked Needs human. For the people who can dispatch fixes, the card carries Retry, Mark resolved, and Won't fix right there, so the next move is one click. The agent knowing when to stop is as much a part of the design as the agent knowing how to fix.

The cap is the point. An agent that retries without limit is a worse problem than the bug it was chasing. Three attempts is enough to clear the fixes that are genuinely mechanical and short of the number where more tries stop helping, and when it is reached, the work lands in front of a person with its history attached rather than disappearing into a retry that never ends.

Reading the attempts

Every finding that went through the loop keeps its work on the record. A Show attempts control on the card opens the timeline: one entry per attempt, each with the commit the agent pushed (linked to GitHub), the tests it ran and how many passed, and the verdict the recheck reached. When a fix took three tries, you can read all three, see what each commit changed, and see exactly where the recheck turned. The loop is auditable by construction, because every turn of it left a commit and a result behind.

add_photo_alternate

Screenshot to capture

A finding card with its 'Show attempts (3)' accordion expanded, dark theme. A vertical timeline of three nodes: 'Attempt 1' (a commit SHA 'a1b2c3d' linked with an external glyph, '4/5 tests', a red 'Failed' verdict badge), 'Attempt 2' (commit 'e4f5a6b', '5/5 tests', an amber '1 new blocking' badge meaning a regression), 'Attempt 3' (commit 'c7d8e9f', '6/6 tests', a green 'Passed' verdict). Each node has a relative timestamp. The finding's status badge above reads 'Resolved' (green).

save as: public/docs-media/pr-review-attempts.png

Caption when added: Show attempts: every pass the agent made, with the commit it pushed, the tests it ran, and where the recheck turned.

A finding, end to end

Tom reviews Maya's CSV download button pull request, PR #142, and leaves one comment on a line in src/export/csv.ts: the new button should respect the existing column filter before exporting. He submits it as a request changes.

The webhook fires. The comment lands in the app as a finding on the plan, marked Blocking, because Tom requested changes rather than only commenting. Maya sees it on the review surface next to the diff it is about, reads it, and clicks Fix this.

The card flips to Fix in progress. An agent picks the finding up in a fresh sandbox, reads the filter code the comment points at, makes the change, and pushes it as a new commit on PR #142. Then it rechecks, running the plan's test cases rather than only the line Tom named. One of them fails, an export test that assumed no filter. The fix helped but did not finish, so the card counts up to Attempt 2/3 and the agent goes again, this time carrying the filter through the header row too. The recheck passes clean. The finding goes Resolved and the card collapses.

Maya never wrote the fix and never argued in the thread. Tom re-reviews the two commits on GitHub, sees his comment answered in code, approves, and merges. The whole turn, both attempts and the test runs that decided them, sits behind Show attempts on the finding for anyone who reads the pull request later.

add_photo_alternate

Screenshot to capture

A finding card in the Conversation panel mid-loop, dark theme. The status badge reads 'Regressed' in an amber tint. A banner across the card body reads 'Attempt 2 introduced a new failure: export-with-filter.test.ts now fails (was passing).' The loop is visibly paused (no spinner), and three controls sit at the card foot: Retry, Mark resolved (greyed/disabled), Won't fix. The severity pill still reads 'Blocking'. A 'Show attempts (2)' accordion at the bottom edge.

save as: public/docs-media/pr-review-regressed-banner.png

Caption when added: A regression pauses the loop instead of digging deeper: the card names the test that broke and hands the next move to a person.

Many findings, many reviewers

One finding on one pull request is the easy case. A real review has a dozen findings across a handful of reviewers, some blocking and some not, some answered in one attempt and some that land back on a person, and the surface is built so the dozen reads as cleanly as the one.

The conversation stays one stream no matter how many people fed it. A blocking comment from Tom, a question from Priya, and three suggestions from the agent's own review sit in the same panel, each carrying its author chip, so you read the change rather than chase who-said-what across tabs.

When the stream gets long, the header narrows it. Group by severity to put the blocking findings first, group by file to read a change one file at a time, or leave resolved findings hidden so only the open work shows. A reviewer who left one comment on GitHub can open the surface, find it as a finding, and watch it resolve without learning a new tool.

Findings resolve independently, not in a line. Fix this on one finding sends one agent into one sandbox, and dispatching a fix on a second finding does not wait on the first. Each finding runs its own implement-and-recheck loop and reports back on its own card.

Several findings can be in progress at once, each pushing its own commits, each rechecking against the plan's test cases, so the slow part is the reviewing you wanted to do anyway rather than a queue. The loop is per-finding by design, which is what lets a heavy review clear in close to the time its heaviest single finding takes.

What does not parallelize is the judgment, and that is the point. The agents do the typing on as many findings as you dispatch; the decisions about what to merge stay with the reviewers, on GitHub, in the order they choose.

add_photo_alternate

Screenshot to capture

The Conversation panel on a busy review, dark theme. A header strip reads '12 findings . 3 blocking . 9 resolved' with a 'Group by: Severity' dropdown active and a 'Hide resolved' switch on. Under a 'Blocking (3)' group heading, three stacked finding cards from mixed authors: one with Tom's GitHub avatar and a red 'Blocking' pill (status 'Resolved', collapsed and struck through), one with Priya's avatar and a red 'Blocking' pill (status 'Running . Attempt 2/3' with a spinner), and one with a sparkle 'AI Review' chip and a red 'Blocking' pill (status 'Open', a 'Fix this' button). Below, a collapsed 'Question (2)' group heading.

save as: public/docs-media/pr-review-conversation-grouped.png

Caption when added: A heavier review, grouped by severity: mixed authors, mixed states, each finding on its own loop, all in one stream.

Who can review and who can fix

Everything above is what the loop does. This is who gets to drive it: reading a review is open to your whole team, and dispatching an agent at it is held a little closer.

Anyone who can see the plan can open its review, read the diff, and read every finding. Viewers and members included, so the engineer who left the comment on GitHub can watch it become a resolved finding without any extra grant. Dispatching a Fix this, editing a finding, or marking one resolved or won't-fix takes the review-findings.manage scope, which rides on your planners, admins, and owners. The split matches the weight of the action: reading review is something everyone does, and starting agent work against your codebase is something a smaller, accountable set of people do.

That same scope is what lets a person take a finding in hand directly rather than through the loop. A manager can resolve or re-open any finding by hand at any time, not only after a hand-off, and can reclassify a finding's severity when the automatic mapping read it heavier or lighter than the team would: downgrade a blocking comment to a nit, or mark a question won't-fix and move on. The loop is the fast path; the manual controls are always there underneath it.

You want to	Scope
Open the review, read the diff and the findings	`pr-review.read` + `review-findings.read`
Dispatch Fix this, edit a finding, mark resolved or won't-fix	`review-findings.manage`

Why the loop works this way

The hard part of agent-written code was never the first draft. It was the review, the back-and-forth that turns one comment into a week: a reviewer asks for a change, someone makes it, the reviewer checks it, something else broke, and around again. Disco Parrot keeps the half of that loop that belongs to people exactly where it is, on GitHub, in your reviewers' own words, and automates the half that was always mechanical: making the change, checking it, and trying again until it holds.

For a planner, the review surface is the one place a pull request stops being a thing happening somewhere else: the plan you wrote, the comment your reviewer left, and the commit that answered it all read on one page, so "where does PR #142 stand" is a glance, not a standup question.

For an engineer, Fix this answers the comment you left with commits you review, not a thread the agent argues in, and the recheck runs your test cases so a fix that breaks something else is caught before you see it.

For a lead, the loop stops itself: three attempts, then a person, with every commit and every test result on the record, so an agent never grinds unattended against a problem it cannot solve.

For the person who has to trust agents near your codebase, the line is bright: the agent reads your reviewers' comments and answers them with code, and it never writes a word back to the pull request, never resolves a thread, never approves itself. Review stays human; the agent does the typing.

merge

Ship code

How the pull request opens under your identity, and why GitHub holds the merge.

hub

Reviewable autonomy

The operating model the loop is one expression of: the agent proposes, a person decides.

checklist

Test cases

What the recheck runs to confirm a fix held and nothing else broke.

fact_check

Documentation health reviews

Where a reviewer's comment that the docs should have caught becomes a signal a later review acts on.