Risk Twins: Test the Strategy Before It Touches a Customer

ShareX LinkedIn Facebook Email

A B2B SaaS company was pivoting from product-led growth to enterprise sales. The head of sales designed a new reward function for her agents, emphasizing deal velocity and quarterly quota attainment. Reasonable instincts. Before deploying it to a single real prospect, she ran it through a simulation environment populated with several hundred completed deals.

The simulation revealed the problem within hours. Enterprise procurement cycles run four to six months and demand consultative relationship-building. Agents optimizing for velocity skipped discovery calls, overpromised features, and neglected technical validation. Simulated close rates looked promising — and sixty-day churn spiked to 34%. She revised the reward function before it ever ran live, rebalancing toward relationship quality and deal-structure soundness. By month six, enterprise ARR was building healthily at single-digit churn, against the simulated 34%.

That environment is a Risk Twin: a simulated copy of a deployment, run in accelerated time across thousands of scenarios, that surfaces failure modes before customers do.

The tension: we test code, not strategy

Most engineering organizations would never ship a code change without running it against a test suite. The idea is uncontroversial. Yet the same organizations routinely ship strategy changes — a new reward weight, a revised escalation threshold, a shifted objective function — straight into production, validated by nothing more rigorous than "this looks reasonable."

Steel-man the practice: strategy has always been untestable. You couldn't run a counterfactual on a pricing decision; you made the call and watched the quarter unfold. The conventional view treats strategic experimentation as inherently expensive and slow, so organizations ration it. They run a few big bets a year and live with the results.

That assumption was true when strategy lived in human heads and quarterly plans. It stops being true the moment strategy lives in code. When the objective function is a version-controlled artifact — when the company's behavior is defined by [[strategy-as-code|policies in a repository]] — a proposed change becomes something you can replay against history. The constraint was never strategy's nature. It was the medium.

The reframe: simulation changes the relationship with experimentation

The obvious value of a Risk Twin is catching a bad reward function before it ships. The deeper value is what it does to organizational behavior. When testing costs nothing and takes hours rather than months, you run experiments you'd otherwise never attempt. You discover that a reward weight you've held for years produces perverse incentives in edge cases. You find escalation thresholds set too high or too low. You stress-test governance against failure scenarios that have never occurred in production.

A firm like Aether Dynamics runs its Risk Twin continuously, in parallel with production. When a pull request proposes a policy change, the pipeline automatically deploys it to the twin at 10× speed — compressing thirty simulated days into three hours. The approval gate isn't "this looks reasonable." It's "this passed simulation under normal conditions plus three adversarial scenarios." Strategy acquires a test suite.

Figure: A Risk Twin runs thousands of scenarios in accelerated time — surfacing failures before customers do, and making recovery testable before redeployment.

The mechanism: detection before, validation after

Risk Twins do two jobs in a governance system, and both are worth naming because most teams only think of the first.

Before deployment, they're a detection instrument. Consider Klarna's customer service deployment, where an agent optimized for closure handled emotionally complex cases by applying the same script to everyone. A Risk Twin running thousands of simulated angry-customer interactions before launch would have surfaced what the resolution-rate metric obscured: high repeat-contact rates and a sentiment drop on post-contact surveys. That's a detection signal, pre-deployment, in simulation. Klarna would have seen the emotional-complexity gap before customers lived through it.

After a failure, they make recovery testable. Run the post-mortem scenario through the twin to verify that a charter amendment or model recalibration actually fixes the failure mode before you redeploy. Recovery becomes evidence-based rather than confidence-based. Teams that skip this step — that amend the charter and redeploy on faith — tend to watch the same failure recur with a different surface presentation.

There's a discipline in knowing the limits. A Risk Twin validates strategy against historical patterns; it cannot predict black-swan events outside its training distribution. A pricing strategy that lifts revenue in simulation may collapse under a competitive pricing war the simulation never modeled. So the governance protocol watches three divergence signals on live deployment — confidence intervals widening beyond historical ranges, early performance deviation in the first 48 hours, and novelty detection for patterns outside the training distribution — and on a fire, it reduces deployment scope, increases human oversight, and activates rollback. Simulation tells you what happens in the world you've experienced. It catches the largest class of failures — the foreseeable ones — and is honest about the rest.

What to do

Build the twin from real history, not synthetic data. Populate it with completed deals, resolved tickets, actual decisions. The fidelity of the simulation is the fidelity of your historical record.
Gate strategy changes on simulation, not opinion. Wire the twin into the deployment pipeline so every proposed policy change runs at accelerated speed plus adversarial scenarios before merge.
Make recovery a drill, not a first-time event. Run post-mortem scenarios through the twin on a regular cadence. Organizations that rehearse their worst-case governance failures treat real incidents as drills they've already run.
Instrument for the failures simulation can't model. Pair the twin with live divergence monitoring so you catch the black-swan class the moment it deviates from the world you trained on.

The principle

You cannot simulate everything. But you can simulate the foreseeable — and the foreseeable is where most expensive failures live. A Risk Twin converts strategy from an irreversible bet into a testable hypothesis. The companies that run it ship faster, not slower, because they've moved the cost of being wrong from production to simulation, where it's measured in compute instead of customers.

Adapted from the essays accompanying AI‑Born by Mehran Granfar. Themes drawn from Volume I, "The Machine Core".

Order the series

ShareX LinkedIn Facebook Email

Risk Twins: Test the Strategy Before It Touches a Customer

The tension: we test code, not strategy

The reframe: simulation changes the relationship with experimentation

The mechanism: detection before, validation after

What to do

The principle

Risk Twins

Alignment Debt

Strategy as Code

Agent Charters

Alignment Debt: The Risk That Compounds While Every Dashboard Stays Green

Governance Is Velocity, Not Friction

What Klarna's Reversal Actually Teaches: Governance, Not AI, Was the Failure

Alignment Debt: The Risk That Compounds While Every Dashboard Stays Green

Detection, Escalation, Recovery: The Triad That Makes Autonomous Deployment Safe

Governance Is Velocity, Not Friction

Essays from
the lineage break.

The tension: we test code, not strategy

The reframe: simulation changes the relationship with experimentation

The mechanism: detection before, validation after

What to do

The principle

Risk Twins

Alignment Debt

Strategy as Code

Agent Charters

Alignment Debt: The Risk That Compounds While Every Dashboard Stays Green

Governance Is Velocity, Not Friction

What Klarna's Reversal Actually Teaches: Governance, Not AI, Was the Failure

Alignment Debt: The Risk That Compounds While Every Dashboard Stays Green

Detection, Escalation, Recovery: The Triad That Makes Autonomous Deployment Safe

Governance Is Velocity, Not Friction

Essays fromthe lineage break.

Essays from
the lineage break.