Audit and evidence

The record you hand a reviewer. Every change and every boundary event is written down with who did it and whether it was a person or an agent, the agent's edits are marked apart from your team's, the log exports to a scoped CSV, and the platform's security events carry their own evidence pipeline you can stream to your own Log Analytics.

When an agent changes a record, or the boundary refuses a credential, the question a governance owner has to be able to answer is the plain one: is there a record of it, and can you produce it on demand? This page is that answer. Disco Parrot writes down what happened, marks the agent's hand apart from your team's, lets you export the part a reviewer asked for, and carries the platform's own boundary events through an evidence pipeline you can stream to your security team's tooling. It is the attestation layer the rest of this section points back to. The everyday version of the tenant record is audit trails; this page is the governance view, with the security-evidence pipeline that the concept page only touches.

What lands on the record

The audit log is the tenant-facing record of discrete events, and it spans more than the work. It captures the lifecycle of every record (creates, updates, deletes, and restores on initiatives, plans, bugs, projects, portfolios, teams, sprints, goals, and test cases), the configuration behind the work (edits to skills, agent instructions, sandbox profiles, environments, connected-tool servers, and runtime configs), and the membership and authorization changes (invitations, role changes, scope denials, license events). Alongside those, it records the events a security review cares about most: every agent-authored edit, the full lifecycle of every credential lease, every secret the boundary refused to inject into a sandbox, and every managed command an agent ran.

The audit log spans the work, the configuration behind it, and the membership and authorization around it. Set apart is the band a security review reads first: the agent's own edits, the full lifecycle of every credential lease, the secrets the boundary refused by name, and the managed commands an agent ran.

Each entry is shaped to read without a lookup. A row carries who acted (the actor's id and display name), what they did (one of a fixed set of actions, from created and updated to granted, denied, and revoked), which record it touched (the entity type, id, and a human label), what changed (for an update, the fields that moved with their before and after values), a one-line summary, a content hash tying the row to the version it produced, a server-assigned timestamp, and the mark that matters most below. A refused secret is recorded by its name, never its value, so the evidence shows the block without leaking the credential it blocked.

The agent's hand is always marked apart

Every entry carries an edit source, and this is the field a governance review leans on hardest. A change is marked as made by a person or by an agent, through a guarded, permission-checked path; the schema reserves a third mark for the platform's own writers, though the marks in use today are person and agent. So an agent's edits never blend into your team's: a record that changed on its own is never a mystery, because the row says an agent did it and which agent. The same mark stamps the version history of a record, so the two records agree about who made a change. An entry is written as the change happens, not reconstructed afterward, and the platform keeps the log as an append record: there is no product path to edit or delete an individual entry, and only the retention window removes aged rows. A row, once written, is the row a reviewer reads.

This is what makes "an agent did real work here" a fact a reviewer can stand on rather than a worry. The labor an agent did is enumerated and attributed, set apart from the labor a person did, on every record it touched.

add_photo_alternate

Screenshot to capture

A single expanded audit log row under Platform, dark theme, for one update event on a plan titled 'Checkout flow rework'. A small 'AI' pill badge in cyan sits next to the actor name 'Codex agent', with a column key noting the greyed style used for human edits. The expanded field-diff lists two changed fields, 'Estimate' from '5' to '8' and 'Status' from 'todo' to 'in progress', each before value struck in grey and each after value in green. A content-hash string in monospace sits at the row footer, and a server-assigned timestamp on the right. Surface #131316, border #27272a.

save as: public/docs-media/audit-row-ai-mark.png

Caption when added: One update, expanded. The AI mark names the agent that made the change, the changed fields carry their before and after values, and the content hash ties the row to the exact version it produced.

What an agent ran, not only what it changed

A record of what changed is one half of "an agent did real work here." The other half is what the agent ran. When an agent runs a named command inside its sandbox, the platform records the request to run it, the result when it completes, and the refusal when a policy turns it down. That executed action is the bridge between "an agent decided" and "an agent did": a reviewer can see not only that a record changed but that the agent ran the command that produced the change, and that the command was one its approval allowed. A denied command is on the record with its reason, so a refusal reads as legibly as a success, and these rows carry their own entity type, so a review can pull just the commands an agent ran when that is the question. This is also one of the events that lands in both the tenant trail and the security evidence stream, so it sits in the everyday record and the auditor's record at once.

Hand a reviewer the rows that matter

An audit log is only useful in a review if you can produce the right slice of it, so the log filters and exports. You read it at Platform, Audit Log, filter by source and by entity type, and export the filtered view to a CSV scoped to exactly those filters, so you hand an auditor the rows their question is about rather than a year of everything. A single export is capped so a file stays manageable, and when a filter matches more than the cap the response flags that it was truncated, so an operator wiring an automated pull can detect it and narrow the filter rather than quietly miss rows.

For a single record, the trail comes pre-scoped: open a plan, a bug, or a sandbox profile and its own audit feed is right there, so a reviewer can read the history of one record without filtering the whole log. And the same tenant-wide trail surfaces as the everyday activity feed in Settings, a friendlier lens on the identical record for catching up day to day, gated by the same audit.read. One trail, two views: the activity feed to keep up, the Audit Log to filter, export, and investigate.

Running an export is itself an event the platform records: the filters you used and the number of rows produced. The exported rows are not copied into a second record, but the act of exporting is on the trail, so the question "who pulled evidence, and what did they pull" has its own answer.

add_photo_alternate

Screenshot to capture

The Audit Log page under Platform, dark theme, on the All tab, with the When / Who / Action / Summary columns. A filter row reads 'Source: AI' and 'Entity type: sandbox-secret-policy', so the visible rows are blocked-secret events, each summary reading 'Provider refused tenant envs: forbidden keys [ANTHROPIC_API_KEY]' with the key name shown and no value. An 'Export CSV' button sits in the header with a small note 'Scoped to your filters'. Surface #131316, border #27272a.

save as: public/docs-media/audit-export-scoped.png

Caption when added: The audit log filtered to one event type and exported to a scoped CSV. A reviewer gets the rows their question is about, the refused secret shown by name and never by value.

A record built for a review, with retention to match

Two retention windows sit behind this, by design. The tenant audit log is retained for about a year by default, sized for everyday operations, with the platform enforcing the window so anything past it is removed on its own. The platform's own security events, the boundary-level record below, carry their own retention contract, configurable to the longer window the agreements a security team works to require. For a workspace that needs a compliance archive beyond what the operational log holds, the security event stream is the channel you wire into your archive of record.

The security boundary keeps its own evidence

The audit log is the tenant's record. Underneath it, the platform keeps a separate, operator-facing record of the events at the security boundary, and this is the record a security team brings to an auditor, kept apart from the day-to-day log. A boundary event, a bundle issued to a host, a credential lease granted or denied, a secret kept out of a sandbox, a session revoked, a document downloaded, an audit export run, a retention or purge operation, becomes a security event with a fixed, reviewable shape: who, what, the outcome, the scope or entitlement involved, and a payload bounded by a rule rather than left open. Each event is classified for where it belongs: some land in the tenant audit log, some in the security evidence stream, and some in both, so the right record carries the right event without one trail standing in for the other.

A boundary event becomes a security event with a fixed shape and a sensitivity that bounds what it may carry, so a credential-adjacent event records that a secret was handled without recording the secret. Where the events go is yours to decide and opt-in: nothing leaves by default, and you wire the stream to your own Log Analytics. For a formal handoff the pipeline produces a hash-verified evidence package.

The rule on the payload is the part a security owner should know about. Every security event carries a sensitivity, one of four levels, and the level governs what the event is allowed to carry. A tenant-safe event can hold ordinary detail; a credential-adjacent event, the kind raised when a secret is involved, is held to a strict allow-list so it records that a credential was handled without ever recording the credential. So the evidence record is bounded at the point it is written, not scrubbed after the fact. The same redaction that keeps secret values out of the audit log keeps them out of the security stream.

Where these events go is yours to decide, and it is opt-in. The platform can stream its security events to your own Azure Log Analytics, into a known table, so your security team reads them in the tooling they already run. Until you wire that up, the events are validated and held to their shape but not sent anywhere outside the platform, so turning on the stream is a deliberate step you take when you want the events in your own systems, not a default that ships them somewhere by surprise.

For a formal handoff, the platform's evidence pipeline produces an evidence package on demand: a manifest of the events and artifacts in scope, each artifact carried with its SHA-256 hash and the manifest itself hashed over its canonical content, plus a verification record that checks those hashes and flags any mismatch. The hash is what lets a reviewer confirm the package they hold is the package the platform produced, unchanged. The manifest is checked to carry no credential reference before it is sealed, so the evidence you hand over is complete and clean at once. The CSV export is the operational slice you read day to day; the evidence package is the sealed, hash-verified form for a formal handoff, so the right tool meets the right moment. The verification is itself recorded as a security event, so the trail carries not only the package but the record of when it was checked and that it held.

Who can read and export

Reading the audit log takes the audit.read scope and exporting it takes audit.export, and the two are held by the Owner, Admin, and Billing Manager roles, the seats your organization trusts with oversight and compliance. The other built-in roles do not see the audit log at all. Export also depends on a plan entitlement, and the button is hidden when the entitlement is absent rather than failing with an error a person cannot place, so who can pull evidence is a deliberate, narrow grant.

Three records that corroborate

The audit log is one of three records the platform keeps, and a review is strongest when they agree. The audit log says what changed. Sessions keeps the agent's turn-by-turn transcript, how it happened, every tool call and file edit with credentials redacted. Version history keeps each saved version of a record, what it looked like at every step, and a restore is non-destructive and itself audited. All three carry the same person-or-agent mark, so a change attributed to an agent in one record is attributed to that agent in the others, and a reviewer can check one against another rather than trust a single trail.

The audit log says what changed. Version history keeps each saved version of a record, so a restore returns to the same state rather than an approximation. Sessions keeps the agent's turn-by-turn transcript with credentials redacted. All three carry the same person-or-agent mark, so a change attributed to an agent in one record is attributed to that agent in the others, and a reviewer can check one against another rather than trust a single trail.

What a reviewer can verify

The point of all of it is that a review reads the record rather than trusts a description:

Every meaningful change is attributed, with the actor and a mark that says person, agent, or platform, so an agent's work is enumerated apart from your team's.
Boundary events are on the record, every credential lease granted or denied, each denial carrying its reason so a refusal reads as "denied, lifetime exceeds policy" rather than an opaque no, and every secret the platform refused to inject, the secret shown by name and never by value.
Evidence exports cleanly, a CSV scoped to the filters a question needs, with the act of exporting itself recorded.
Security events carry a bounded shape, sensitivity-classified and payload-limited so a credential-adjacent event never holds the credential, and streamable to your own Log Analytics when you wire it.
A formal package is hash-verified, so a reviewer can confirm the evidence they hold is unchanged from what the platform produced.

Answering an auditor, end to end

An external review asks Sarah the questions reviews always ask: show us that changes are attributed, that an agent's work is distinguishable from a person's, and that you can produce the record on demand. A year ago that would have meant a scramble. Here it is a filter and an export. She opens the Audit Log, filters to AI-sourced changes, and the rows are exactly the agent edits, each with the agent named, the record it touched, and the fields that moved. She exports that view to a CSV scoped to those filters, and the export is itself logged, so the evidence handoff is on the trail too. When the reviewer asks about the security boundary, she points at the security event stream her team already pulls into their Log Analytics, where the bundle issuances and the credential leases land with their sensitivity marks and no secret values. The questions that used to take a week take an afternoon, because every answer is a record she can produce rather than an assurance she has to give.

Why audit and evidence work this way

The tempting shortcut is to log enough to debug and call it evidence. It fails the moment a reviewer asks a question the logs were not shaped to answer, or asks whether the record could have been quietly edited after the fact. So the model is built the other way around: the record is shaped for the questions a review asks, the agent's hand is marked on every row so AI work is never indistinguishable from human work, the slice a reviewer needs exports on its own, and the security boundary keeps a second record bounded at the point of writing so a credential never rides into the evidence. The platform manages the retention, and the formal package is hash-verified so its integrity is something a reviewer checks rather than takes on faith.

None of it claims a certification. What it gives you is the thing a certification rests on: a complete, attributed, exportable record of what happened, with the agent's work marked apart and the security boundary kept on its own evidence trail. The record is built to be handed over, which is the only test of an evidence trail that matters.

For the person who owns compliance, this is the page you bring to an audit. The record is attributed, the agent's hand is marked apart, the slice a reviewer asks for exports on its own, and the formal handoff is a hash-verified package. You answer the questions from the record rather than from memory.

For the person who owns security, the security-evidence pipeline is the detail to check: boundary events with a fixed shape, a sensitivity on every one that bounds its payload so a credential-adjacent event never carries the credential, and an opt-in stream to your own Log Analytics. The audit log and the security stream agree on the facts and serve two different retention needs.

For a team lead, the audit log answers "who changed this, and when" in a filter, and because the row names the agent rather than only the time, a record changing on its own is never a mystery. You spend the time on the work, not on reconstructing who touched what.

For a prospect evaluating the platform, this is what makes agent work defensible to adopt. The agent does real work, and every piece of it is on the record with the agent named, exportable for a review, and kept on a security evidence trail you can stream to your own tooling. The proof is the record, not the promise.

receipt_long

Audit trails

The everyday mechanics: the three records, Sessions, and version history.

approval

Human oversight and approvals

The approval decisions that land on this record.

lock

Data access

What an agent reaches in your data, and what it never can.

security

Security overview

The whole model the evidence trail proves.