The Infrastructure Schism

Authority Is Not Truth: Why Autonomous AI Infrastructure Must Be Governed Against Physical Reality, Not the Model It Trusts

Executive summary

Autonomous agents are beginning to take real control of physical infrastructure, and every governance framework the industry is racing to build constrains what those agents are permitted to do, not whether what they believe about the physical world is true. The model they act on is provably wrong: in one industry study, around 35 percent of a telecom operator's asset records were erroneous, enough to justify building a system to correct them automatically (IEEE WACV 2017, Hebbalaguppe et al.), and the consequences are already on the front page, with Microsoft reporting elevated Azure latency after the September 2025 Red Sea cable cuts (Microsoft). At the physical layer the agent's actions cannot be undone inside the window that matters: you cannot recover the synchronization a training cluster loses while it is being rerouted onto a path that turns out to be shared. The frontier of safe autonomy is therefore not more permissions, more audit, or more escalation paths, but verified ground truth, with strict bounds wherever truth cannot be established. This paper names the systematic gap between the trusted model and physical reality the Infrastructure Schism, and argues that closing it is the precondition for handing irreversible control to machines.

1. You do not have a tooling problem. You have a truth problem.

Your source of truth tells you, with a confidence you have earned, that Route A is diverse from Route B. Your inventory agrees. Your carrier's contract agrees. You have a defense ready for the argument this paper is about to make, and in fact you have three.

The first defense is that your model already gives you ground truth. It does not. A model of your infrastructure, however detailed, is built from records: design intents, vendor diversity attestations, as-built drawings, inventory rows. A model drawn from records inherits every error in those records, and the records are wrong far more often than the polish of the interface suggests. The fidelity of the picture is not evidence about the fidelity of the data underneath it. Your model can be detailed and confidently mistaken at the same time, because detail is a property of the model and truth is a property of the territory.

The second defense is that this is a telco problem and not yours. The fiber belongs to a carrier, the diversity is contractual, and a signed service-level agreement names the penalty if the path is not what was sold. A penalty clause is not a physical guarantee. An SLA can compensate you after a correlated failure, but it cannot prevent one, because it cannot override geography. Two systems sold as diverse can share a chokepoint that no contract mentions, and shared physical fate is not a line item you can negotiate away. When the corridor goes, both paths go, and the contract pays out on an outage you were told could not happen.

The third defense is that you govern your automation, so you are covered. This is the most dangerous of the three, because it is half right. You almost certainly do govern your automation. You have policy gates, approval workflows, change windows, and audit trails. None of that verifies the model the automation reasons over. Governance that constrains an action while trusting the data behind it will execute a catastrophic mistake flawlessly, at machine speed, fully logged and fully compliant, onto a network that does not match the record.

Notice what all three defenses share. Every control you have governs authority. Not one of them verifies truth. So before you read further, ask the question your tooling is not built to answer: have you cross-referenced live physical telemetry from your traffic-carrying fiber against the diversity maps your vendor handed you? If you have not, you do not know that your diverse paths are diverse. You have been told they are, and you have been governing on the strength of being told.

2. The Schism is real, systemic, and far larger than fiber.

Call it by its name. The Infrastructure Schism is the systematic, unmonitored divergence between the model an autonomous system trusts and the physical reality it acts on. It is the author's term for a structural condition, not a vendor category. It shows up in any system whose control plane reasons over a representation that nothing continuously checks against the world. From here on the paper mostly just calls it the gap.

It is easiest to see in fiber, and that is where the public evidence is strongest, but the same pattern recurs wherever a model is refreshed from paperwork instead of from the world. The gap opens along at least four dimensions.

The first is logical-versus-physical diversity. This is the acute, multi-zone, front-page failure: two or more systems documented as independent that physically share a conduit, a landing station, or a geographic chokepoint. When the chokepoint fails, the modeled independence evaporates and supposedly uncorrelated systems go down together. The Red Sea makes this concrete. In February 2024, three nominally diverse systems (AAE-1, EIG, and the Seacom/Tata TGN system) were cut in close succession in the southern Red Sea, taking out on the order of 25 percent of regional traffic (HGC; TeleGeography). The cause was not a deliberate cable attack, despite the headlines that said so. It was the dragging anchor of the Rubymar, a vessel struck and abandoned in the regional conflict. The anchor scraped the seabed for days before the ship sank, an explanation the US government and one of the cable operators both supported. The distinction matters: this was correlated physical failure through a shared corridor, the exact failure mode a diversity model is supposed to rule out and routinely does not. The Baltic showed the same pattern in November 2024, when C-Lion1 and BCS East-West were cut within roughly a day of each other (Reuters). And the corridor is still live: after the September 2025 Red Sea cuts, Microsoft publicly reported added Azure latency (Microsoft), which is the gap with a hyperscaler's name attached to it.

Hold onto one precision here, because it is where careless versions of this argument fall apart. These systems share a corridor, a chokepoint they all transit, not a proven shared conduit. TeleGeography went so far as to note that two of the systems often described as separate are a single cable at the point of the cut. There is no public asset-level evidence that the rest run through one physical duct, and the corridor is enough. Correlated fate does not require a shared duct, only a shared point of failure, and the chokepoint is one.

The second dimension is optical-margin gray failure, and it is the one most teams under-weight precisely because it never trips an alarm. The interface reads UP. The link is carrying traffic. And it is quietly dropping a fraction of a percent of packets because its physical margin has degraded below the threshold the model assumes. A fraction of a percent is invisible to a status page and ruinous to anything synchronous: it stretches tail latency and corrupts the timing that distributed training depends on. The model says the link is healthy because the model reads the link's self-report. The link's self-report is not the same thing as the link's physics.

The third dimension is committed-versus-actual capacity. The model holds the capacity that was provisioned and intended. The plant carries the capacity that physical conditions currently permit. The two drift apart silently, and an automated system reasoning over the committed figure will schedule load the physical plant cannot actually carry.

The fourth dimension moves the gap inside the building: power and thermal headroom. The model assumes the headroom in the design spec. The rack has the headroom it actually holds after the realities of airflow, ambient load, and density. An autonomous scheduler that places work against assumed headroom rather than measured headroom is making the same category of error as the one that trusts a stale diversity map, one layer up the stack. This dimension has less public evidence behind it than the fiber cases, which is the honest reason fiber leads this paper. The claim here is structural, not that a named thermal incident proves it.

A useful way to feel the diversity case end to end is a route from Johannesburg to Marseille. (The following is a synthetic illustration, developed in full in Appendix B and labeled synthetic throughout; it is a worked construction, not a measured incident.) A path can be documented as cleanly diverse while it physically transits the Red Sea corridor, which gives it two illusions at once. The first is a latency illusion: the documented distance does not match the measured distance. The second is a diversity illusion: the path the model treats as independent shares the most contested chokepoint on the planet with several others. The corridor is what makes both illusions physically plausible rather than contrived, which is exactly why it is the right anchor for the synthetic case.

It is tempting to locate the whole problem in vivid mechanisms, like mapping tools that snap a cable's path to the nearest road and quietly misplace it by the width of a right-of-way. That snapping is real and it is a good story, but it is one mechanism, not the foundation of the argument. The load-bearing claim is broader and duller. Telecom asset records are wrong often enough that operators build computer-vision systems to correct them (IEEE WACV 2017, Hebbalaguppe et al.). And the network sources of truth that automation reads from are, by design, records of intended state rather than observed state, which their own maintainers warn is unsafe to automate against once it goes stale (Nautobot). The as-built record was wrong the day it was filed, and it has only drifted further since. The gap is not an exception that good tooling eliminates. It is the steady state of every infrastructure whose model is refreshed from paperwork instead of from the world.

fig1 infrastructure schism

Figure 1. The Infrastructure Schism. Governance operates on the top band, the model, which says two paths are diverse. Physical reality lives on the bottom, where both share one corridor. The unmonitored gap between them is where autonomous, irreversible action goes wrong.

3. The gap was survivable. Autonomy is about to make it permanent.

For as long as the gap has existed, it was a planning nuisance. A wrong model produced a wrong plan, a human caught the wrong plan, and the error was corrected before it touched the physical plant. The gap was real, but it was survivable, because a person stood between the bad data and the irreversible act.

That person is being removed from the loop right now, and that is the entire reason this paper exists.

Autonomy is being pointed directly at the physical layer. Hyperscalers already run software-controlled physical reconfiguration in production: Google's TPU v4 supercomputer uses optical circuit switches to reconfigure its interconnect topology under software control, and has since 2020 (Jouppi et al., TPU v4, 2023). That is not an autonomous agent making the call, and the paper does not claim it is. It establishes that the primitive, machine-directed change to the physical path, already exists and runs at scale. The destination is also now written down. ITU-T Technical Report GSTR-ION-2030, agreed at the SG15 plenary in Geneva in October 2025, lays out a strategic framework that names chief technology officers as its audience and covers AI and optical mutual empowerment, digital twins, autonomous control, and integrated sensing. It names where the industry is going. It does not close the gap on the way there.

There is an asymmetry worth reading carefully, because it is itself a piece of evidence. The telecom operators that deploy network autonomy tend to publish it, because for them autonomy is an efficiency story and efficiency is a thing you announce. The hyperscalers that deploy physical-layer autonomy tend to say very little about it, because for them it is a competitive advantage and advantage is a thing you keep quiet. The silence is not absence. When the parties with the most advanced physical-layer capability are also the parties least willing to describe it, the reasonable inference is not that they have less, it is that they have more, and that they regard it as a moat. The claim this paper defends is narrow and precise: there is no public, production-scale system that governs autonomous physical action against verified ground truth.

Here is where intellectual honesty has to do real work, so it will: there is no autonomous-failure case to cite, because physical-layer autonomy operating against this gap is barely deployed yet. This argument is not built on a disaster that has already happened. It is structural. We know the model is wrong (Section 2). We know autonomy is being aimed at the physical layer (this section). And we know one more thing that turns those two facts from a concern into a thesis.

The thing we know is irreversibility, and it belongs to the agent's own action, not to the accident that prompts it. An agent that reroutes live synchronized traffic onto a path it has just been told is diverse cannot recover the synchronization the cluster loses in the window before the mistake surfaces. An agent that reconfigures an optical switch, sheds a load, or executes a power action crosses a physical threshold that no later command un-crosses. You can always issue a second command. You cannot retrieve the work, the sync, or the margin the first one already spent. That is the relevant sense of irreversible: not impossible to reverse forever, but not recoverable inside the operating window in which the action mattered. The shift this section is about is the shift from a correctable error to an unrecoverable one, and that shift is the hinge of the entire paper. It is welded to the gap, not separable from it: the model is wrong in the first place, and now the cost of an agent acting on the wrong model has gone from a plan you catch to a consequence you live with. When autonomy meets the gap, the failure will be irreversible. That sentence is forward-looking on purpose, and it is the whole argument.

4. The Hierarchy of Truth: you cannot govern what you cannot verify.

If the problem is that authority is governed and truth is not, the governing principle has to put truth first. Call it the Hierarchy of Truth (the author's term, drawn from the Aegius work): measured physical reality overrides the documented or modeled state, every time the two disagree. The map yields to the territory. This is not a tie-breaker, it is a ranking, and the ranking is absolute, because the territory is the thing that actually carries your traffic and the map is only a claim about it.

The corollary is the thesis compressed to a single line: you cannot govern what you cannot verify. Authority without verified ground truth is not safe autonomy. It is confident, audited, policy-compliant error, which is in some ways worse than ungoverned error, because the audit trail certifies the mistake.

State the principle in two parts, because a one-part version collapses the moment Section 6 admits, honestly, that some ground truth cannot be measured. Part one: verify where you can, and let measured reality override the model wherever measurement reaches. Part two: where you cannot verify, govern as if the model is untrue, which means zero-trust assumptions about the path and deliberate hedging of the blast radius. Without the second clause the principle reads as a counsel of despair as soon as you hit a path you cannot see into. With it, the principle covers the whole field: measurement where you have it, structural caution where you do not.

This is where the principle has to earn its keep against the obvious objection, which is that the industry already has disciplines for this. It does not, and the distinction is sharp. Intent-based networking translates intent into configuration and continuously checks that the configuration matches the intent. AIOps ingests telemetry and automates response. Both are real, both are valuable, and both share a blind spot: they verify the network against the model. None of them verifies the model against physical reality. The Hierarchy of Truth is not a better intent engine and not a smarter AIOps. It is a layer neither of them contains: the layer that asks whether the model itself is true. That is a missing layer, not a refinement of an existing one, which is precisely why no product you already own provides it.

5. The mechanism: grade governance by irreversibility, not by speed.

A principle needs a mechanism, and the mechanism follows directly from the hinge in Section 3. If irreversibility is what makes the gap catastrophic, then irreversibility is what governance should be organized around. Grade every autonomous action by two properties: how reversible it is, and how large its blast radius. The closer an action sits to an irreversible physical change, the tighter the constraint on it and the more it must be anchored to verified ground truth before it is allowed to proceed. Call this irreversibility-graded governance (the author's term).

The shape of the control loop is sense the truth, ground the decision, then bound the execution. Sense: pull real measured signal from the physical layer rather than trusting the record. Ground: reconcile the decision against that measured signal, so the action is reasoned over reality and not over the map. Bound: constrain what the execution is permitted to do as a function of its grade, so that the irreversible actions are the ones held to the strictest verification and the smallest blast radius.

fig2 sense ground bound loop

Figure 2. The sense, ground, bound loop. The governing loop runs continuously: sense measured signal, ground the decision against reality, bound the execution by its grade.

The escalation is intuitive. A diagnostic read is correctable, you re-run it. A translation of intent into a concrete change is costlier, you may have built the wrong thing, but nothing live has broken. An execution against a live physical system is irreversible in the moment it occurs, and it is that last category, and only that category, that justifies the heaviest governance.

fig3 irreversibility graded governance

Figure 3. Irreversibility-graded governance. Diagnostic actions are easily reversed, translation is costly if wrong, execution cannot be undone. Governance and verification rise across that line.

This is not theoretical. A working instance of the loop exists. Aegius, a product of Tenwa AI (a DIMAGGI initiative), implements a governed sense-translate-execute loop with policy-bounded execution, in which the execution stage is constrained by policy rather than trusted by default. Be precise about what that proves and what it does not. It proves the governance loop is buildable, that policy can sit inside the loop and refuse to act on an unverified proposition. It does not prove the hard half, the cross-provider physical verification that Section 6 shows is still largely unbuilt. Aegius is the proof point for the cheap half, and it stays a proof point: the claim is that the shape is buildable because it has been built, not that you should buy a particular thing.

One concession, made plainly because it earns the argument its credibility. Deferring this governance is rational today. Physical-layer autonomy is early, the failure case has not yet materialized, and building a verification-and-grading layer ahead of the autonomy that needs it looks like over-engineering against a risk that has not arrived. That calculus is correct right now. It is also about to flip, for the reason Section 3 gave: the moment autonomy is executing irreversible physical actions against an unverified model, the cost of not having built the layer stops being theoretical and becomes the first unrecoverable failure. You build governance early not because the risk is here, but because the layer cannot be retrofitted after the irreversible action it was supposed to prevent.

6. Can you actually verify? Here is where truth stops.

The thesis would be naive if it assumed everything can be measured. It cannot. A verification claim that cannot say where it fails is not credible, so what follows marks both what is measurable and exactly where measurement runs out.

Start with what is real today. The fiber you already run can be read as a sensor, with no new hardware. The transceivers at each end of a modern link continuously track how the fiber distorts the light they carry, because that is how they recover the data. One of the things they track, the polarization of the light, is exquisitely sensitive to physical disturbance of the glass: vibration, movement, strain, temperature. That signal is already being computed. Reading it out for sensing is a software change, not a capital project. This is not a lab curiosity. Polarization sensing has been demonstrated on live, traffic-carrying terrestrial fiber (Communications Engineering 2024, Carver and Zhou) and across a transoceanic submarine cable (Science 2021, the Curie cable). Distributed acoustic sensing adds a second channel that can locate a disturbance along the span (Science), now reflected in ITU standards work for submarine and distributed fiber sensing. The point is simple and load-bearing: the network is also an instrument, and it has been one in peer-reviewed practice for years.

Now the wall, stated without softening. Ground truth dies where administrative boundaries meet physical occlusion. If you do not own the dirt, the conduit, and the glass end to end, your deterministic visibility stops at your own demarcation point. Past it, you are reasoning about someone else's physical plant from the outside. No amount of telemetry on your own span tells you with certainty what happens to the light after it leaves your control. The measurable region is real and it is large, but it is bounded, and pretending otherwise would just rebuild the gap one level up.

fig4 verification wall

Figure 4. The verification wall. Above the wall, on what you own, measurement gives deterministic truth. Below it, on third-party paths you do not own, you fall back to inference and zero-trust.

So what do you do at the wall? You do what Section 4's second clause prescribed: you shift from measurement to inference. You cannot measure the shared fate of two paths you do not own, but you can look for its signature. Two paths that share a hidden chokepoint should show the same physical disturbance at the same time, so you correlate the signatures of paths documented as diverse and look for the shared movement that betrays the chokepoint. Be honest about how far this is proven, because this is exactly where a careful reader pushes back. The sensing is demonstrated. Correlating two arbitrary leased paths you do not own, over long distances, to confirm or deny shared fate is still largely a research and proof-of-concept result, not a deployed capability. The signal is integrated over the whole path, a short shared segment can hide inside two long otherwise-diverse routes, and a clean result is far from guaranteed. Where the inference is weak or unavailable, you fall back to the second clause: route under zero-trust assumptions and treat diversity as unproven until it is shown. Measured where you can see, inferred where you cannot, zero-trust at the boundary.

And here is the opening the whole paper has been circling. Every primitive named here exists. Polarization sensing exists, distributed acoustic sensing exists, the standards work exists (ITU, and ETSI F5G co-cable detection), and there is documented proof-of-concept work on inferring shared risk from polarization, including the IOWN Global Forum's 2025 work. What does not exist is the assembly. No one has put these primitives together into a governance layer that verifies autonomy against physical ground truth at production scale, across the ownership boundaries that matter. The science is largely ready. The layer is not built. That gap, between primitives that exist and a governance layer that does not, is the white space this paper is written into.

7. Governing the loop and securing the loop are the same problem.

One supporting point, kept supporting on purpose, because the paper has exactly one pillar and this is not a second one. A sense-ground-bound loop is itself an attack surface. The moment governance depends on measured signal, that signal becomes a target: telemetry can be spoofed, sensors can be poisoned, and the inference stage can be fed adversarial inputs designed to make a shared path look diverse or a degraded link look healthy. The catalog of these tactics against agentic and machine-learning systems is already mapped (MITRE ATLAS). The conclusion is short. Governing the loop and securing the loop are not two projects. They are one. A verification layer you cannot trust the inputs of is just the gap with extra steps, so the integrity of the sensing is part of the governance, not an add-on to it.

8. Done right, governance is cheaper than the alternatives.

The reflexive objection is that governance slows everything down, and that the price of safety is velocity. The data does not support the reflex. Blanket approval gates, the heavy-handed kind that put a human checkpoint in front of every change, do not reduce failure rates and correlate with lower-performing organizations rather than safer ones (DORA). They buy the feeling of control at the cost of throughput, and they do not deliver the safety they were imposed to provide. That finding comes from software delivery, not physical infrastructure, so use it for exactly what it supports: not that irreversible physical actions deserve a light touch, they do not, but that taxing every action equally is the wrong design. The point of grading is to spend scrutiny where it matters.

Irreversibility-graded governance is the opposite trade from blanket control. Because it concentrates control on the small set of genuinely irreversible actions and gets out of the way everywhere else, it is cheaper than blanket gates, which tax every action equally, and it is cheaper than ungoverned autonomy, which pays for its speed in the eventual unrecoverable failure. Scrutiny is a scarce resource, and grading spends it only where the action cannot be undone. You are not choosing between safety and speed. You are choosing where to spend control, and the irreversible actions are the only place it actually has to be spent. The human role improves as a result: people stop being universal gates and start defining which propositions require proof, which thresholds matter, and which actions cross into irreversibility. That is elevation, not removal.

8.5. What to start thinking about now, and where to begin.

If the argument has landed, the honest next step is still not a full deployment, because the layer that closes this gap end to end does not exist to buy yet. What follows is a posture and an on-ramp, not a rollout: the foundational work you can begin today, before the autonomy arrives, and the direction the rest of it evolves in.

Begin with the data, because that is the hard part and the part you control. Physical-layer data resists programmatic reconciliation, so start by cleaning it: reconcile your records against each other and against what you can actually measure, and stop treating an unverified field as a fact. Establish data governance over that cleaned state if you do not already have it, so that provenance, ownership, and confidence travel with every record instead of being lost the moment it lands in a source of truth. Then build an end-to-end state lifecycle for your physical-layer components, so that a path, a span, a wavelength, a chassis carries a tracked state from provisioning through change to retirement, rather than a static row that was true once.

On that foundation you can start surfacing the gap with signals you already have. Run AI-driven mismatch detection against what is measurable, observed round-trip time against documented length, measured against expected, and let the model flag where the physical evidence and the record disagree. Those flags are not verdicts. Put them through quality assurance, confirm or reject each sample, and feed the result back to reinforce the framework, so the detection sharpens against your own network rather than a generic assumption. Everything past this point is customizable to each organization's operating procedures and the visibility it actually has, and it evolves as the sensing and the standards evolve.

One principle holds throughout, and it is the discipline that keeps this honest. Ground truth is not established as a binary. It is established within tolerance. Measurement carries noise, inference carries confidence, and operations need slack, so the goal is not a true-or-false stamp on every path but a graded, tolerance-aware picture of how far the model can be trusted and where it cannot. A system that demands certainty will either stall or lie. A system that reasons in tolerances can act where the evidence supports it and hold back where it does not.

This is also where the work is moving from argument into practice. Tenwa AI, a DIMAGGI initiative, is building toward exactly this, an intelligence layer for autonomous network infrastructure, and the Aegius demo (https://tenwa.ai/products#aegius-demo) is one place to see the governed-loop idea applied rather than only described.

None of this is the finished layer, and none of it removes the verification wall. It is the architecture you reason toward and the foundation you can lay now, so that when physical-layer autonomy arrives at scale, you are not handing it irreversible control over a model you never verified.

9. Where this ends: the network that reconciles itself.

Push the logic forward and the endpoint is not a better dashboard. It is an inversion of which artifact is authoritative. Today the documented model is the source of truth and the physical plant is what the model describes. In the architecture this paper argues toward, the relationship flips: the physical plant, sensed continuously, becomes the source of truth, and the documented model becomes a hypothesis the measurements are constantly testing. A route is not diverse because a record says so. It is diverse because nothing in its measured behavior betrays a shared fate, and the moment something does, its diversity rating falls on its own, without a human filing a ticket.

Follow that to its conclusion and the control plane changes character. A path that cannot be shown to be physically what the model claims does not get the traffic that depends on the claim. Verification stops being a report a human reads and becomes a precondition the system enforces. ITU-T GSTR-ION-2030 is the standards community pointing at this same horizon: AI-native optical networks, digital twins, autonomous control, and integrated sensing, named together as the direction of travel. It describes the destination. It does not yet describe the layer that makes autonomy safe to send there, which is the gap this paper has been about.

This is the disciplined extrapolation of the thesis, not a manifesto, and the boundary is worth marking plainly. A self-reconciling ground-truth layer is years of unglamorous engineering away: the sensing is fragmented, the cross-domain correlation is unproven at scale, and the hardest part, verifying what you do not own, may never be fully solved and will have to be managed rather than closed. The larger argument, that intelligent infrastructure is itself the moat and that whoever builds this layer first compounds an advantage that is hard to copy, is a separate piece of work. The bounded claim here is narrower and sturdier: the trajectory of autonomous infrastructure runs toward a network that reconciles its model against its own physics, and the organizations that start architecting for that now will be the ones able to trust their autonomy when it arrives.

10. Close the Schism first.

The instinct, when an autonomous system does something catastrophic, will be to add another control. Another approval gate, another policy, another audit hook, another escalation path. Every one of those controls governs authority, and authority was never the thing that failed. The agent that reroutes live traffic onto a path it has just been told is diverse, and that turns out to share a chokepoint with the path it was fleeing, did not exceed its permissions. It used them perfectly. It obeyed a model that was confidently, auditably wrong, and at the physical layer there was no taking the action back.

That is the uncomfortable shape of the next failure. It will not look like a breach or a rogue agent. It will look like compliance. The logs will be clean. The policy will have been followed. And the cluster will still have stalled, or the capacity still stranded, because nothing in the stack ever checked whether the world the agent acted on was the world that exists.

So the work to do now is not more permissioning. It is verification. You cannot govern what you cannot verify, and the industry is preparing to hand irreversible physical control to systems reasoning over a model it has never made them check against reality. The Infrastructure Schism was survivable as long as a human stood between the wrong model and the irreversible act. We are removing the human. Before we do, we should close the gap they were quietly covering.

Build the layer that lets your autonomy tell authority from truth. Build it before the autonomy arrives, because it cannot be retrofitted after the first action you cannot undo. Close the Schism first.

Appendix A. How the network becomes a sensor

This appendix explains the mechanism behind Section 6 in plain terms, for the reader who wants to know why reading the network as an instrument is real rather than aspirational. It deliberately stays at the level of mechanism rather than mathematics.

The core idea is that the equipment you already run is already measuring the physical world, as a side effect of moving data. A coherent optical transceiver, the device at each end of a modern high-capacity link, has to track how the fiber distorts the light passing through it, because that tracking is how it recovers the signal. Some of what it tracks, in particular the polarization of the light, is extremely sensitive to physical disturbance of the glass: a vibration, a movement, a change in strain or temperature shifts it. Those values are computed continuously, for free, while the link carries traffic. Turning them into a sensing feed is a software capability, not new hardware. That "for free" property is what makes state-of-polarization the most practical signal: it is available on ordinary deployed links without dedicating fiber or wavelengths to sensing.

Two complementary techniques fill in what polarization alone cannot. Distributed acoustic sensing sends light pulses down a fiber and reads the faint reflections to locate a disturbance along the span, at the cost of more specialized equipment. Optical time-domain reflectometry locates loss, bends, and breaks on spans you can actively test. Together with chassis-level power and thermal telemetry for the in-building case, these give a layered picture of physical state on the infrastructure you own.

Inferring shared fate is where it gets useful and where it gets hard. If two paths that are supposed to be physically separate show the same disturbance at the same time, that correlation is evidence they are not separate, which is the basis of inferred shared-risk detection. The honest caveat, carried straight from Section 6, is that this inference is demonstrated in controlled and proof-of-concept settings. Applying it across long, leased, third-party paths you do not own is the open problem, not a solved one: the signal is integrated over the whole route, a short shared segment can hide inside two otherwise-diverse paths, and the absence of a correlation is weaker evidence than its presence.

The verification wall (Figure 4) is the line this honesty draws. Inside what you own, measurement gives deterministic ground truth. Across a leased wavelength or a shared consortium system, deterministic visibility stops at your demarcation point, and the layer must shift to inference: treat the path as unverified, route under zero-trust assumptions, and use correlation and latency signatures to detect alteration, degradation, or unrecorded consolidation. The wall is not a flaw in the approach. It is the line the approach has to be honest about.

Appendix B. The synthetic case (labeled synthetic)

Methodology

This is a synthetic simulation, not a measured incident. The topology, telemetry values, and failure dynamics are constructed to show how an autonomous governance layer would behave when a path's documented diversity is false. The behaviors are modeled on real failure modes (the February 2024 Red Sea anchor-drag event and the September 2025 Azure latency the corridor produced) so the mechanism is realistic, but the specific figures are illustrative and not calibrated measurements. One figure of merit, the correlation coefficient that flips the path's rating, assumes the cross-provider inference works cleanly. As Section 6 and Appendix A both stress, that assumption is the unsolved part, so this case illustrates how the layer would behave if the inference is reliable, not a demonstration that it is. Figure 5 shows the case at a glance.

fig5 red sea corridor synthetic

Figure 5. The synthetic Johannesburg to Marseille case. Two paths documented as diverse physically share the Red Sea corridor, where a single dragging anchor cuts both.

Topology: JNB to MRS

A hyperscaler provisions two paths it believes are physically isolated, from a Johannesburg (JNB) cloud region to a European hub in Marseille (MRS), engineered for low-latency synchronization of training checkpoints.

| --- | --- | --- | --- |

In the source of truth, the two share no nodes, no transit providers, and no shared-risk attributes. The model asserts absolute diversity.

The illusion

Path B's documented West African route would imply a substantially longer optical path than Path A. If Path B's measured latency comes back materially lower than its documented route allows, the only explanation is that it is not on its documented route. In the synthetic model, Path B's provider has quietly cross-connected it onto an East African system through unrecorded capacity substitution at the wholesale layer. Both paths now traverse the same Red Sea corridor. The model still reports them as diverse, because the model was never told. (Latency figures are illustrative; the point is the direction of the discrepancy, not an exact value.)

The failure

A physical disturbance impairs Path A. The agent detects the degradation.
The agent consults the source of truth, sees Path B tagged fully diverse and operational, and initiates a machine-speed reroute, shifting the critical inter-region training traffic onto Path B.
Path B shares the corridor and was impaired by the same event. The sudden load tips it into a cascading gray failure, stalling the synchronous training cluster across the footprint. The reroute is the agent's irreversible act: the lost synchronization does not come back when the path is later corrected.

Resolution, conditional on the inference working

Under the Hierarchy of Truth, and assuming the cross-provider correlation is detectable here, the cascade is blocked before execution:

Continuous correlation. Before the event, the governance layer monitors both paths and observes that corridor-level disturbances appear on Path A and Path B together. (The coefficient is synthetic, and detecting it on unowned paths is the open problem.)
Dynamic override. The layer overrides the source-of-truth diversity record, attaches a shared-physical-fate flag to Path B, and downgrades its rating.
Blast-radius bounding. When Path A degrades, the agent attempts the standard reroute. The policy boundary intercepts it, classifies the action as irreversible with an unverified destination, and refuses to dump the full load onto the flagged path. It throttles non-critical workloads at the source instead, honoring physical evidence over the documented model.

The honest reading of this case is not "the layer works." It is "this is the behavior the layer must produce, and the load-bearing dependency is the correlation step, which is exactly the part the industry has not yet built at production scale."

Appendix C. Source map and verification

Scoped to the factual claims in Parts I through III. The synthetic case in Appendix B is labeled synthetic at every point of use and asserts no measured figures.

|---|---|---|---|---|

| 1 | Telecom asset records are erroneous often enough to motivate automated correction; stated as ~35 percent | §0, §2 | IEEE WACV 2017, Hebbalaguppe et al. | Paper confirmed (WACV 2017, pp. 725-733). Figure given as ~35 percent, the midpoint of the paper's stated 30 to 40 percent range, per author direction; exact value not independently re-checked. |

| 3 | Feb 2024: three systems (AAE-1, EIG, Seacom/Tata TGN) cut in the southern Red Sea; ~25% of regional traffic; cause the Rubymar's dragging anchor | §2 | HGC; TeleGeography; US assessment | Verified. TeleGeography notes Seacom/TGN is one cable at the cut, which supports the count. |

| 8 | Hyperscalers run software-controlled physical reconfiguration (optical circuit switching), deployed since 2020 | §3 | Jouppi et al., TPU v4, arXiv 2304.01433 | Verified. (Datacenter-scale OCS counterpart: Mission Apollo, arXiv 2208.10041.) |

| 9 | ITU-T GSTR-ION-2030, agreed SG15 Geneva Oct 2025, framework naming CTOs, covering AI-optical, sensing, autonomous control | §3, §9 | ITU-T GSTR-ION-2030 (10/2025) | Verified, strongest citation. Named as Technical Report / framework, never "standard." |

| 14 | Inferring shared risk from polarization is demonstrated in PoC, not at production scale across owned and unowned paths | §6, App. A, App. B | IOWN Global Forum 2025; ETSI F5G co-cable detection | Verified as PoC / direction. The "no production-scale assembly" claim is the white-space tier. |

Maggie Nanyonga