Data and tenancy model

Where your work is stored and how one workspace's data stays one workspace's data. Cosmos DB partitioned by workspace, history in a shared versions container, large fields that spill to blob storage, sessions in Azure Table, and schema changes that run as idempotent, hash-checked migrations.

Every record you create in Disco Parrot, an initiative, a plan, a bug, a sprint, a saved report, lands in a database, and the first question a technical reader asks is the right one: where does it go, and what keeps one workspace's data inside that workspace. This page is the architecture answer: how the data layer is built, so you can reason about durability, isolation, and history. It is not the security account. How that data is protected, who can read it, and what an agent can reach live on data access and encryption and data handling.

The short version: one Azure Cosmos account, nearly every container partitioned by workspace, with each domain of the product owning its own container grouped into shared-throughput databases, version history in a shared container, oversized text offloaded to blob storage, and sessions in Azure Table. The platform reaches all of it as a managed identity, with no stored database key. The rest of this page walks each decision and why it holds.

The workspace is the partition

Disco Parrot stores its records in Azure Cosmos DB, and nearly every container is partitioned by the same key: /tenantId, the workspace the record belongs to. A partition key is the field Cosmos uses to physically group documents, and it is also the unit a query is scoped to. Every read the platform issues names the workspace it is reading for, and the query runs against that one partition, so a read for one workspace does not range over another's data. The platform does not issue an unkeyed, cross-partition read for tenant data: a read is scoped to one workspace, or it is not a read the platform makes. The workspace it is scoped to comes from the authenticated request, not from a value the caller is free to choose, so a read cannot be widened by asking for a different one.

The write side has the matching guarantee. The partition value comes off the record itself, because the platform sets tenantId on the document body when it writes and Cosmos reads the partition from that field. What happens if that field is ever missing or wrong is the useful part: Cosmos rejects the write at the SDK boundary rather than quietly filing the record in the wrong place. The workspace boundary is structural in both directions, not a convention the code has to remember to honor on every call.

A small number of containers partition by something other than the workspace, because their records are not workspace-scoped: a domain claim partitions by the domain being claimed, and a few platform-internal records, the security-evidence and network-boundary verification packages, partition by the claim they belong to. Those are the intended exceptions. Everything that belongs to your workspace is keyed to your workspace.

This is a logical partition, a scoped boundary in a shared account, not a separate database or a per-workspace encryption key. The full account of what that means for isolation, and the controls layered on top of it, lives on data access and encryption and data handling. The architecture point here is narrow: the workspace id is the axis the whole store is organized around.

Every container is partitioned by workspace. A read is scoped to one partition, never another's.

A container per domain, throughput shared across the database

Cosmos organizes data as databases that hold containers, and Disco Parrot groups them by domain. Identity and sessions sit in one database, integrations and repositories in another, the planning work model in another, the runtime and sandbox records in another, teams and reporting in another. Inside each, a container holds one kind of record: initiatives in one, plans in another, sprints in another, audit entries in another.

Throughput is bought at the database level, not per container. Each database runs on a shared autoscale budget that every container in it draws from, so a quiet container costs nothing extra and a busy one borrows headroom from its neighbors instead of needing its own provisioned capacity. This is the practical reason the data is spread across a handful of databases rather than one: Cosmos caps how many containers share a single throughput offer, and grouping by domain keeps related, co-accessed records on the same budget.

Each store knows its own address

There is no central table that maps a record type to its home. Each store carries its own address: it declares the database and container it belongs to as a small descriptor on the class itself, and a single resolver reads that descriptor to hand back the right container handle. Adding a new kind of record means giving its store a descriptor, not editing a registry that every other store also depends on, so a new domain cannot break an existing one by touching shared routing code.

History lives in one shared container

When you edit an initiative, a plan, a skill, or an agent instruction, the previous state is not overwritten and lost. A full snapshot is appended to a single shared container, entityVersions, that holds version history for every versioned entity type at once. Each snapshot records what kind of entity it was, which specific record, and the version number, so the history of any one record reads back in order.

That container carries a composite index tuned to exactly the query the history view makes: find the snapshots for this entity type and this record, newest first. A composite index lets Cosmos satisfy a two-field match plus an ordering in one efficient lookup instead of scanning. The result is that pulling up "every prior version of this plan, latest first" stays fast no matter how long the history grows.

Keeping all history in one container, rather than a versions table per entity, is what lets a single helper handle the snapshot-on-change decision generically. The same append-if-changed logic covers plans and skills and agent instructions without each one reimplementing it. The reader-facing version of this is the entity versioning concept; the architecture point is that one well-indexed container backs all of it.

Large fields spill to blob, transparently

Most records are small. A few hold large free text: a skill's prompt, an initiative's body. The platform keeps each of those fields below a set threshold, around 58 KB, a per-field limit rather than a cap on the record as a whole. When a field crosses that size, the platform writes the value to Azure Blob storage and leaves a short reference in its place, rather than capping what you can write. A short skill prompt stays inline; a long pasted spec spills to blob, and the store that reads it back cannot tell the difference. When the record is read back, the platform sees the reference and fetches the real value from blob before handing it to the caller.

The round trip is invisible to the code that reads and writes the record. A store asks for an initiative's body and gets the body, whether it lived inline or in blob; nothing upstream has to know which. The overflow also shares the record's lifecycle: when the record is deleted, the blob that held its overflow is deleted with it, so there is no separate object left to track or to orphan. This keeps the common case, small records, fast and cheap, while letting the occasional long document exist without a hard limit or a special code path that callers have to remember.

Sessions and the fast-moving records sit in Azure Table

Not every kind of record belongs in Cosmos. A signed-in session is small, written constantly, read on every request, and disposable after a week. That shape fits Azure Table storage, a simpler key-value store, better than a Cosmos container, so that is where sessions live, in a table partitioned by workspace with a seven-day lifetime. The same module that backs the table is what provides the blob-overflow helper above, since both are part of the storage-account tier rather than the Cosmos tier.

The split follows the data. Cosmos holds the durable work model, the records you query, version, and report on. The storage account holds the high-churn, short-lived, or oversized data, sessions and blob overflow, that does not need Cosmos's indexing and would only cost more to keep there.

Three stores, each holding the data that fits it. Cosmos for the work model, Table for churn, Blob for overflow.

Changing the shape of stored data

A product changes, and so does the shape of what it stores. Disco Parrot moves data with explicit migrations that are written to be safe to run more than once. A migration writes with an upsert, so a re-run after a partial failure rewrites the same record on top of itself, a no-op, rather than duplicating it or failing. The original container is left in place until the migration has run and been watched in production for a while, so nothing is deleted on the strength of a single run.

Correctness is checked rather than assumed. After a migration runs, a verification pass compares a content hash of every source record against the record that landed in the target, and reports any row that is missing or whose hash does not match. A divergence is a named, surfaced result, not a silent drift discovered later. The larger migrations run as a three-step operation a person drives: audit what would change, dry-run it without writing, then execute. There is no in-place destructive rewrite that runs on its own.

A migration moves through gates a person drives, and a verification pass checks it. Nothing destructive runs on its own.

How it fits together for one workspace

When Sarah's team starts using Disco Parrot, their first initiative is a document in the planning database, in the initiatives container, partitioned to their workspace. Editing it appends a snapshot to the shared versions container, indexed so the history reads back newest-first. If the body grows past the field ceiling, it moves to blob with a reference left behind, and no one writing or reading it has to know. Their sessions live in Azure Table with a seven-day life. Every one of those reads is scoped to their workspace by partition key, and the platform reaches all of it as a managed identity with no stored account key to leak. Backups and regional resilience belong to the managed tiers and the terms of their agreement, covered on encryption and data handling; the data model's own job is to keep what they have, never overwriting it in place.

Why the data model works this way

The expensive failures in a multi-tenant store are the ones that cross a boundary or lose history: a query that returns another customer's row, an edit that overwrites the only copy of what came before, a migration that silently drops records. The model is built so those failures are prevented by structure rather than by remembering to code carefully. Isolation rides on the partition key, read from the record itself, with a mispartitioned write refused at the boundary. History is append-only in a shared, indexed container, so the prior state of a record is always recoverable. Migrations are idempotent and hash-checked, so re-running one is safe and a divergence is caught by a verification pass instead of a support ticket. The throughput is pooled per database so cost tracks real use rather than per-container guesses. Each decision is the kind that holds up when the system is busy and no one is watching it closely, which is the only time the guarantees actually matter.

For a developer building on the model, the shape to keep in mind is one container per kind of record, every one partitioned by workspace, with each store carrying its own database-and-container address rather than looking one up in a central registry. History is a shared entityVersions container, large text fields transparently overflow to blob, and sessions live in Azure Table rather than Cosmos. You write against the record and its tenantId; the partition, the overflow, and the snapshot are the platform's job, not yours.

For a platform engineer, the database to watch is the one whose shared autoscale budget runs hot under its busiest container. Schema changes run as idempotent, hash-verified, audit-then-dry-run-then-execute migrations, and the composite index on the versions container is what keeps history queries off a scan as the history grows.

For an enterprise reviewer, the isolation answer is a logical partition by workspace, enforced at the SDK boundary on writes and never widened by an unkeyed read, reached only through a managed identity with no standing database key. It is not a per-tenant database or a per-tenant key, and the controls that sit on top of the partition are covered on data access and encryption and data handling.

For a prospect evaluating durability, the thing to notice is that your work is never overwritten in place. Editing a record snapshots the prior version, migrations are checked rather than trusted, and nothing destructive runs without a person driving it. Durability beyond the application, backups and regional resilience, is part of your agreement and described on encryption and data handling. The data model is built to keep what you have, not just to store the latest copy.

account_tree

System overview

The single service and SPA the data layer sits inside.

lock

Data access

What an agent can reach in your data, and what it never can.

encrypted

Encryption and data handling

How the data at rest and in transit is protected.

history

Entity versioning and history

The version history the shared container backs.