# SimOps: The Digital Twin That Learns

## Integrated Modelling, Simulation, Procurement, and Operations on a Continuously Calibrating Substrate

**Project:** kask-simops on Agent Bestiary World
**Date:** 2026-05-20
**Status:** Internal and confidential — kask.bio · axelotl.partners

---

## Abstract

Most process-modelling tools sell a static answer: a spreadsheet, a
discrete-event simulator, or a BI dashboard. They model your
process once and let the model atrophy. The synthetic data they
generate stays synthetic, never tested against reality. When real
sensors eventually arrive, they go to a different system entirely
and the model never learns.

**SimOps** is a process-modelling, simulation, procurement, and
operational-twin substrate built on Agent Bestiary World (ABW). It
collapses those four functions into a single git-backed artifact:
the **workspace**. The workspace is simultaneously the process
design, the simulation testbed, the procurement record, and the
operational digital twin. An agent fleet operates on this artifact
continuously; every simulation, every procurement decision, every
sensor reading becomes a `sosa_observation` row that feeds a
recursive self-improvement (RSI) loop spanning five distinct
feedback channels.

We argue that this architecture solves a different problem than
existing tools attempt to solve. The familiar tools ask "what is
the answer for this process?" SimOps asks "how does the model of
this process improve as it runs?" The answer compounds.

Critically, we treat **synthetic data as training data** from the
first simulation. The agent fleet learns from synthetic
observations from t=0 and recalibrates priors as live sensor data
arrives. This inverts the usual synthetic-data play: synthetic
isn't a placeholder for reality, it's the bulk of the training set,
and the system's job is to calibrate the synthetic generator
itself against any real data that arrives.

---

## 1. The Architectural Rule

The architectural rule for ABW is simple and load-bearing:

> **If it thinks, it's an agent. If it doesn't, it's a skill, tool,
> read, or write.**

This rule cuts cleanly through every design decision in SimOps:

- The **cascade** — a deterministic energy/mass-balance computation
  over a multi-stage process — is a tool. No thinking, no learning.
  Fast. Served at `/api/simops/cascade` and called directly by the
  live UI for sub-second feedback as the user edits the process.
- The **agents** — `simops_cascade`, `simops_predictor`,
  `simops_advisor`, `simops_optimizer`, `simops_narrator`,
  `comparator`, `sidestream_miner`, `supply_chain_oracle`,
  `regulatory_scanner`, `product_scout`, `valuechain_mapper`,
  `marketing_composer`, `simops_companion` — are the layer that
  learns when to fire the tools, what to read from their outputs,
  and what to write back as observations the next agent can
  consolidate against.

This separation is what makes the loop close. Tools don't pretend
to learn. Agents don't pretend to be deterministic. The rule is
uniform across the platform.

### 1.1 What This Looks Like in Practice

Four flavours of cascade invocation, each in its proper layer:

| Trigger | Path | Why |
|---|---|---|
| Live KPI strip while editing | Tool direct | Sub-second feedback; no thinking required |
| Named simulation in the Simulations tab | Agent (`simops_cascade`) | Each result becomes a `sosa_observation` for the fleet to consolidate against |
| Agent-initiated analysis (e.g. `simops_advisor` reasons "I should run a backward cascade") | Agent fires tool | The decision to run is an episode the strategist learns from |
| Periodic sensor reading on a bound contracted field | Tool writes SOSA directly | The sensor isn't thinking; it's emitting an observation in the SOSA shape |

The first three describe what kask-simops does today (with item 2
being the work in progress at time of writing). The fourth is
infrastructure ABW already provides via the SOSA observation API.

---

## 2. Synthetic Data Is Training Data

Most synthetic-data systems treat synthetic as a placeholder — what
you use until real data shows up. SimOps inverts that:

- **Synthetic observations are training data.** From the first
  simulation. They live in `sosa_observation` rows alongside any
  real readings that arrive later, shape-identical.
- **Sim #1** produces a real `sosa_observation`. Synthetic but
  structurally indistinguishable from a sensor reading.
- **By sim #10** the predictor (an OLS regression engine) can fit
  a model on 10 rows and start forecasting novel parameter
  combinations.
- **By sim #50** R² is high enough to be useful; the comparator
  narrates against forecasts with calibrated confidence intervals;
  routing decisions start accumulating Brier signal.
- **When real sensor data eventually arrives** it doesn't *start*
  the learning. It **recalibrates** priors that already exist.
  Synthetic stays in the training set; reality re-weights the model.

This reframes what looks like "we're early." A workspace with zero
sensors bound and 47 simulations run is not an empty system
waiting to start; it is a system whose entire agent fleet has been
training on the user's stated assumptions for 47 iterations.

### 2.1 SimOps Calibrates the Synthesizer

Other tools have synthetic data generators as plugins, decoupled
from the deployment pipeline. The general synthetic-data problem
in ML is: how do you generate synthetic training data that
actually transfers to reality? Existing answers (GANs,
agent-based simulators, hand-crafted distributions) train on
synthetic, deploy in production, and hope the distributions match.

SimOps closes that loop because synthetic data and reality both
live in the same workspace:

- **Synthetic data is parametrised** by the user's
  `assumptions.yaml` (electricity price, discount rate,
  utilisation rate) and per-stage `efficiency`, `carbon_intensity`,
  `opex_per_input_unit` declarations.
- **The predictor learns the assumption-to-outcome relationship**
  by fitting against the `sosa_observation` rows the cascade has
  been emitting.
- **When real readings arrive**, the predictor compares them
  against what the synthetic-driven model expected. The delta is
  a calibration signal. The user sees:
  > "Your declared `efficiency: 0.18` produced predictions that
  > were 23% optimistic vs the first 8 live observations — the
  > predictor has shifted its estimate to 0.14."
- **A future `simops_synthesizer` agent** can read the current
  process, the assumptions, the existing training data, and the
  user's stated uncertainty bounds — and emit a more realistic
  synthetic time-series with appropriate noise. The agent learns
  what "realistic" means for each domain. Fermentation has
  different variance characteristics than electrolysis.

The SimOps loop **doesn't just train on synthetic data — it
calibrates the synthesizer**. That is a meaningfully different
contribution than "we ship a Monte Carlo plugin."

---

## 3. The Loop

The compounding has a structure. We name five feedback channels,
each running at its own characteristic timescale. (The first four
are platform-level ABW infrastructure; the fifth is where
SimOps-specific calibration data accumulates.)

**Loop 1 — Per-agent learning** (hours). Eval-dimension scores
become semantic rules in the agent ontology. Each agent's
knowledge graph grows from its own experience.

**Loop 2 — HITL behavioural correction** (days). Anomalies surface
to a review queue. Reviewers correct; corrections become synthetic
training episodes at HumanAuthority weight.

**Loop 3 — Workspace coherence** (sessions). Coherence evaluation
keeps the agent team coordinated; the strategist publishes a
coordination brief that members read on subsequent turns.

**Loop 4 — Composition evolution** (weeks). Session patterns drive
strategist proposals for changing team membership.

**Loop 5 — Calibration / routing accuracy** (months). Brier scores
on resolved forecasts feed back as routing weights in the
domain-constrained MoE.

For SimOps specifically, the calibration signal is grounded:

> **`calibration.signal: "sosa_observation"`** — every cascade
> prediction has a downstream observation it can be compared
> against, whether that observation is synthetic (another agent's
> output) or live (a real sensor reading). Brier score is well
> defined regardless.

This is what makes SimOps a domain-constrained Mixture of Experts.
The architecture has four required properties — output contract,
input decomposer, calibration signal, synthesis protocol — and
SimOps has all four:

- **Domain**: process optimisation
- **Output contract**: `sosa_observation` rows + `CascadeResult`
  JSON
- **Calibration signal**: `sosa_observation` (synthetic and live
  alike)
- **Synthesis protocol**: pipeline (cascade → KPI → optimise)
- **Members**: `simops_cascade`, `simops_predictor`,
  `simops_optimizer` (core); `comparator`, `sidestream_miner`,
  `product_scout`, `regulatory_scanner`, `valuechain_mapper`,
  `marketing_composer` (kask extensions)

---

## 4. The Workspace as Single Artifact

The compounding only works if the agents have a coherent training
set. If process design lives in Confluence, simulations live in
MATLAB, procurement lives in NetSuite, and sensor readings live in
Grafana, the agents never see a connected timeline.

SimOps makes the workspace a single git-backed artifact:

- **One process** — `simops/process.yaml`
- **N variations** — `simops/variations/<slug>.yaml` (forked designs
  for comparison)
- **One assumption set** — `simops/assumptions.yaml` (parameters
  governing the synthesizer)
- **N simulations** — `simops/simulations/<id>.yaml`, each carrying
  its setup, per-arm cascade results, comparator narrative,
  recorded decision (if any), and the `sosa_observation` IDs for
  traceability
- **Insights** — folded from the message ledger at query time, not
  stored separately
- **Sensor bindings** — metadata on the SOSA-contracted fields,
  pointing at the source endpoints; the actual readings flow
  through `sosa_observation`
- **BoM provenance** — `stage.bom_resolution` capturing who priced
  each Bill of Materials, when, what risks they flagged, and what
  their headline observation was

Every event the agents need is on one timeline, in one workspace,
queryable from one place. That is the substrate the platform's
five RSI loops fire against.

This is also what makes the workspace a research deliverable. A
process designer, an experiment, a dataset of observations, a
comparator narration, and a recorded decision: the same git
history is the paper's methods, results, and discussion sections.
Continuous research becomes a continuous publication pipeline.

---

## 5. The Pitch: Compounding, Not Feature Parity

This is the unusual market position. The pitch is honest about
the early curve and explicit about the late curve.

**Day 1.** A spreadsheet with structured stages, SOSA contracts on
every field, an agent fleet that has zero training data and knows
it. Outputs are equivalent to what a careful operator would get
from a well-designed Excel model.

**Month 1.** The agent fleet has consolidated against the first
cohort of synthetic simulations. The predictor can forecast novel
variations. The comparator narrates trade-offs over enough runs
that its narratives reflect this specific process, not generic
templates.

**Month 6.** Sensors have been bound; live data reweights synthetic
priors; the routing strategist (Loop 5) has Brier data showing
which agent is best at which kind of question for *this* operation.
`simops_cascade` fires when the user wants deterministic answers;
`simops_predictor` fires when the user wants forecasts;
`simops_advisor` fires when the user wants recommendations —
automatically, because the routing strategist has learned the right
policy from grounded calibration signal.

**Year 1.** The workspace is a digital twin, the agent fleet is
specialised to this operation, the synthesizer knows the local
variance characteristics, and decisions are made against a
continuously-improving model of reality.

The chart is hockey-stick. Worse than a static spreadsheet in week
one (because some of the value of the static spreadsheet is its
finality — it stops moving and you can act on it). Unprecedented
by month six (because no static tool can compound the way an
agent fleet trained on your specific operation can).

**No other simulation tool sells you that.** They sell static
models, or AutoML pipelines decoupled from operations, or BI
dashboards that don't write back to the model. SimOps sells you a
model that learns inside the same artifact that runs your
operations, prices your inputs, and records your decisions.

This is a difficult pitch in a 30-second demo. It is inevitable
once a user has used the system for a month.

---

## 6. The Visible Artifact: Digital Twin Maturity

The compounding has to be **legible** to be sellable. We propose
a small set of visualisations whose purpose is to make the
temporal evolution of confidence visible:

- **Time-series projection per contracted field.** Synthetic
  baseline as a faint line, predictor's evolving forecast as a
  darker line with confidence band, live readings (when present)
  as discrete points. The band tightens as data accumulates.
- **Per-stage uncertainty cones on the sankey.** Instead of flat
  trapezoid bands showing mass flow, the bands fan out width-wise
  to show the cumulative confidence interval at each stage.
  High-R² stages stay tight; uncalibrated stages flare wide.
- **Calibration delta heatmap across the assumption space.** Two
  assumptions on the axes, shaded by where the predictor's
  forecasts have been most or least accurate. Operators see which
  corner of their parameter space they actually have data for.
- **Sidestream economics waterfall** animated over the simulation
  horizon. Bars accumulate primary revenue, then each sidestream's
  value, then risk-weighted adjustments, then net.
- **A twin clock in the workspace header.** Synthetic-tick counter
  and live-tick counter side by side. `Synthetic: 47 obs · Live: 0`
  becomes `Synthetic: 64 · Live: 89` as the workspace ages.
  Calibration shifts visibly with each live tick.
- **Trajectory replay.** Scrub a timeline through the workspace's
  history. Watch the cascade outputs, predictor coefficients, BoM
  costs evolve over the past N days. Procurement timing decisions
  and their consequences become visible.

The connecting thread: every visualisation makes **temporal
evolution of confidence** legible. The digital twin is not a
static snapshot; it is a system that is learning, and the
visualisations show the learning happening.

---

## 7. Why This Architecture Now

Three factors make this thesis tractable now and were prohibitive
two years ago:

**LLMs that can hold context coherently across long sessions.**
The companion strategist holding a coherent workspace context
across multiple turns and dozens of artifacts (process, variations,
simulations, decisions) requires recent-generation models. Older
models could narrate a single simulation but not maintain the
operator's intent across a multi-month design-to-operations
trajectory.

**The SOSA observation API as a unified data substrate.** Every
sensor reading, every simulation result, every forecast, every
agent's intermediate computation has a shape: `sosa_observation`
rows with `observable_property`, `unit`, `result`, `procedure_id`,
`quality_flag`. That uniformity is what lets training data
accumulate across heterogeneous sources without bespoke
integration work.

**Recursive self-improvement at the platform level.** ABW's five
RSI loops are infrastructure, not application code. SimOps does
not implement consolidation, anomaly triage, coherence
evaluation, composition evolution, or Brier-calibrated routing —
those run platform-side, against any workspace, for any app. The
SimOps contribution is to make the workspace produce the signals
those loops consume.

---

## 8. What This Is Not

This whitepaper makes claims about compounding behaviour that have
to be earned over time in actual user operations. We are explicit
about what is shipped and what is staged:

- **Shipped (production):** process YAML editor with SOSA contract
  declarations on every bindable field; tagged-union synthetic/live
  field values; deterministic cascade engine with mass-balance and
  energy-balance modes; variations as forked designs; per-workspace
  assumption sets; simulations as first-class persistent artifacts;
  the supply_chain_oracle priced-BoM workflow with risk-flag
  surfacing and procurement provenance; companion agent with action
  protocol (edit_process, run_simulation, compare_variations,
  invoke_agent, fork_variation, declare_sosa_contract, annotate);
  background simulation runs with status visible in the Activity
  panel.
- **In progress at time of writing:** routing named simulations
  through the `simops_cascade` agent (so every sim writes a
  `sosa_observation` row). This is the single highest-leverage
  change because everything downstream depends on the training set
  accumulating.
- **Staged:** workspace maturity panel; sensor binding writing
  observations on a polling cadence; companion turn-context
  maturity awareness; calibration view on completed simulations;
  the six visualisations enumerated in §6.

The thesis is the architectural commitment. The compounding
trajectory in §5 will be measurable from a workspace's
observation count, predictor R², and Brier-calibrated routing
weights. We will write a follow-up paper when the first workspace
has accumulated enough data to make those numbers concrete.

---

## 9. The ROI Story in One Sentence

A static process model is worth what you build it for once. A
SimOps workspace is worth what it learns about your operation over
its lifetime, and the integrated substrate ensures that learning
compounds across modelling, simulation, procurement, and operations
in a single artifact.

The pitch is "worse on day one, hockey-stick later, with
unprecedented automation and flexibility once the loop closes."

We are getting there for real, as opposed to the vapourware of
static-twin and AutoML systems that promise learning without
providing the substrate that lets it happen.

---

## Acknowledgements

This thesis was articulated across a working session on
2026-05-20 between the kask team and the SimOps implementation
assistant. The clarifying insights — that synthetic data **is**
training data; that SimOps calibrates the synthesizer; that the
visible artifact is temporal evolution of confidence; that the
pitch is honest compounding rather than feature parity — emerged
through that dialogue and are captured here for the record.

## Cross-References

- *Agent Bestiary World: Agentic Infrastructure for Learning
  Adaptive Systems* — platform architecture this thesis assumes
- *Coherence Improvement Loop* — Loop 3 of the five RSI feedback
  channels
- *Cross-Domain Swarm Intelligence* — the same compounding
  architecture in a different vertical
- SimOps v3 specs 00–13 (internal) — the load-bearing technical
  decomposition this thesis explains