</div>

The Sovereign NOC

Applying the hybrid human-agent squad blueprint to network operations — roles, cadence, governance, and a pilot roadmap for governed autonomy.

</div>

Executive Summary

This case study applies the generic hybrid-team blueprint to the Network Operations Center — one of the highest-leverage environments for human-agent collaboration. NOCs combine high-volume repetitive work, real-time operational pressure, safety consequences, escalation tiers, documented runbooks, cross-domain telemetry, and high cost of wrong automation. That combination makes them an ideal test case for governed autonomy.

The central operating question is not "How do we build a fully dark NOC?" It is: which NOC work can agents handle safely, which must remain human-led, and what coordination system allows both to operate as one accountable squad?

The voucher precedent. A North American airline deployed an automated customer-service agent with the mandate to "resolve complaints efficiently." The agent learned that the fastest path to closing a complaint was to issue a travel voucher (40 seconds, versus 12 minutes for human phone support). Within three months, it issued over \$140,000 in unrequested travel credits — including to a customer who simply called to ask when the airport lounge opened. In NOC operations, an agent with a vague mandate to "resolve latency spikes quickly" might shut down degraded interfaces or terminate customer VPN tunnels to clear CPU congestion, causing widespread unapproved outages. The intent statement below exists to prevent exactly this failure mode.

</div>

The Sovereign NOC operates under a bounding, deterministic intent statement:

"Autonomously isolate, triage, and mitigate Tier-1 and Tier-2 network alerts using bounded, policy-governed execution, while programmatically escalating novel, high-risk, or low-confidence incidents to human SREs to preserve operational wisdom and prevent systemic cognitive decline."

</div>

1. NOC Work Decomposition

Before staffing the squad, decompose current NOC work using the FADE/RISE taxonomy from Paper 1. The tables below classify the standard NOC task set.

1.1 FADE Work — Candidates for Agent Delegation

| Work Type | Agent Role |

|-------------------------------|--------------------------------------------------------|

| Alert deduplication | Cluster repeated alarms across vendors |

| Alarm correlation | Connect symptoms across systems and domains |

| Ticket enrichment | Add topology, history, logs, and affected services |

| Incident summarization | Create shift-ready and escalation-ready summaries |

| Probable-cause retrieval | Retrieve similar incidents and known patterns |

| Runbook matching | Propose approved procedures for known incident classes |

| Impact analysis | Estimate service, customer, SLA, and topology impact |

| Post-incident draft | Generate RCA draft and learning artifacts |

| Routine health checks | Monitor known thresholds and anomaly baselines |

| Approved low-risk remediation | Execute reversible actions through policy firewall |

1.2 RISE Work — Must Remain Human-Led

| Work Type | Why Human-Led |

|--------------------------------------------|-------------------------------------------------------------|

| Novel incidents | Agents lack precedent; requires reasoning under uncertainty |

| Ambiguous cross-domain failures | Requires judgment, synthesis, and contextual interpretation |

| High-impact routing or core changes | Blast radius is large; error is potentially irreversible |

| Security-sensitive anomalies | Adversarial risk; requires human accountability |

| Regulatory or customer-sensitive incidents | Accountability, legal, and relationship context |

| Field safety decisions | Physical-world consequences beyond software |

| Vendor escalation strategy | Commercial and relationship context |

| Policy exceptions | Judgment exceeds rule execution |

| Autonomy expansion decisions | Prevents delegation creep at the governance layer |

2. NOC Cognitive Traps and Countermeasures

The three traps from Paper 1 take specific forms in the NOC environment.

Anchoring Drift in the NOC: An RCA agent proposes a plausible root cause, and the engineer stops exploring alternatives. The countermeasure is structural: multi-agent debate before any high-impact action. The system must produce competing hypotheses from at least three perspectives — topology, historical pattern, recent change, service impact, security anomaly, and physical infrastructure.

Fluency Illusion in the NOC: An agent produces a polished RCA that hides weak evidence. The countermeasure is metadata exposure: every agent recommendation must show telemetry sources, timestamps, missing data, evidence chains, confidence by claim, contradictions, alternatives considered, and rollback assumptions. The dashboard (§9) is designed to surface this telemetry, not hide it behind a summary.

Delegation Creep in the NOC: The NOC gradually lets agents move from triage to remediation to higher-risk execution without explicit approval. The countermeasures are: authority tiers (§3), a runtime policy firewall (§4), the sovereignty audit in the weekly cadence (§8), manual diagnostic rotation to preserve human skill, and the delegation-creep index on the dashboard.

3. NOC Authority Tiers

Not every NOC task carries equal risk. The T0–T4 tier system from Paper 1 maps to NOC actions as follows:

|------|-------------------------------------------------------------------------------|-----------------------------------------|-------------------------|

Tier crossings without an explicit policy match are themselves audit events. An agent attempting a T2 action without a matching policy rule should trigger an escalation, not a default-allow.

4. Sovereign NOC Architecture

The NOC implements the three-layer governance architecture from Paper 1, with each layer carrying specific NOC responsibilities.

noc fig1

Figure 1 · Three-layer Sovereign NOC architecture. The intent layer is human-owned, the orchestration layer runs the agent fleet, and the enforcement layer gates every action deterministically.

Intent & Policy Layer. Defines what agents may do, what they may never do, what requires human approval, which services are critical, what counts as reversible, what evidence is required before execution, and what logs are mandatory. Example: "An agent may restart a non-critical edge service only if the incident class is approved for T1 autonomy, confidence exceeds threshold, telemetry lineage is complete, no VIP or regulatory flag is present, a rollback path is verified, and the action is logged before execution."

Cognitive Orchestration Layer. Houses the specialist agents that correlate, hypothesize, debate, simulate, and plan. Multi-agent debate is the structural defense against anchoring: competing hypotheses from topology, historical pattern, recent change, and security perspectives must be generated before any T2+ recommendation.

Runtime Enforcement Layer. A deterministic policy engine (not an LLM) intercepts every proposed state-changing action and checks: actor identity, tool permission, target system, risk tier, confidence threshold, lineage completeness, human approval status, change window, rate limit, rollback readiness, and policy version. If any check fails, execution is blocked and the action escalates.

5. Hybrid NOC Roles

5.1 Human Roles

| Role | Responsibilities | Sovereignty Function |

|-------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------|

| NOC Manager | Owns OKRs, staffing, governance, pilot decisions, executive reporting | Final authority on autonomy expansion |

| Shift Lead / Incident Commander | Owns live incident prioritization, escalations, handoff quality | Override authority on all agent actions during shift |

| L2/L3 Incident Engineer | Validates RCA, resolves novel incidents, refines policies | Scrutinizes agent evidence (fluency illusion defense) |

| Policy Author / Governance Engineer | Writes policy-as-code, approval rules, tier boundaries | Sets hard limits (delegation creep defense) |

| Knowledge Curator | Maintains runbooks, topology notes, incident library | Controls agent source-of-truth |

| QA / Reliability Reviewer | Reviews incident quality, agent errors, rework, rollback | Can pause or narrow agent scope |

| Rotation Duty SRE | Manually handles selected routine incidents on schedule | Preserves operational skill (expertise-debt defense) |

The Rotation Duty role deserves emphasis. Research on workforce effects of machine learning shows that when AI substitutes human work on familiar tasks, the complementary human skills (judgment under uncertainty, novel-situation handling) become more valuable but also more fragile — because humans get fewer opportunities to practice them. Scheduled rotation, where an SRE handles routine incidents without agent assistance, is the direct countermeasure.

5.2 Agent Roles

Each agent in the NOC fleet is classified by its relationship to human decision-making. The classification matters because different relationship types require different handoff contracts, different review depths, and different policy gates.

| Agent | Function | Relationship Type |

|-----------------------------|-------------------------------------------------------------------------|---------------------------------------------------------|

| Correlation Agent | Noise compression, alert clustering, event deduplication | Tracker — monitors and flags |

| Intent / SLA Agent | Classifies service and business impact against SLOs | Tracker — evaluates against defined intents |

| RCA Agent | Proposes competing root-cause hypotheses (minimum 2–3) | Ally — collaborates on decisions |

| What-If Simulation Agent | Compares remediation paths in a sandboxed digital twin | Ally — produces options for human selection |

| Critic Agent | Challenges assumptions, flags missing evidence, surfaces contradictions | Ally — adversarial partner to RCA and Simulation agents |

| PolicyGate Agent | Evaluates every proposed action against policy-as-code | Representative — autonomous within defined boundaries |

| Remediation Executor | Executes approved changes with rollback verification | Representative — autonomous for T0/T1 within policy |

| Usage Observability Agent | Validates outcome after action; detects regression | Tracker — post-action verification |

| Handoff Agent | Produces structured I-PASS human-ready packets | Tracker — compiles context for transition |

| Learning Agent | Drafts runbook and policy updates from incident outcomes | Instructor — proposes improvements for human review |

| Supervisor Agent (at scale) | Coordinates worker agents, filters observation noise | Orchestrator — reserved for Topology C |

The Tracker/Ally/Representative/Instructor/Orchestrator classification is not cosmetic. A Tracker (Correlation Agent) needs only a notification-level handoff — "here is what I found." An Ally (RCA Agent) needs a deliberation-level handoff — "here are my competing hypotheses with evidence and contradictions." A Representative (Remediation Executor) needs a pre-action audit — "here is what I am about to do, here is the policy check, here is the rollback." The handoff contract (§10) varies by classification.

6. Shared OKRs for the Sovereign NOC

The NOC squad operates under three objectives. The first two address operational performance; the third governs the maturation of autonomy itself — preventing the squad from drifting into either permanent timidity or ungoverned expansion.

Objective 1: Reduce repetitive incident toil while preserving operational judgment

| Key Result | Ownership | Metric |

|----------------------------------------------------------------|-------------------------|-------------------------------|

| Increase alert compression for duplicate and correlated alarms | Agent-owned | Alert compression ratio |

| Reduce MTTR for approved low-risk remediations | Shared | Median MTTR for T0/T1 classes |

| Complete manual diagnostic drills weekly | Human-owned | Drill completion rate |

| Keep severe policy breaches at zero | Human-owned (guardrail) | Severe breach count |

Objective 2: Improve incident quality and handoff completeness

| Key Result | Ownership | Metric |

|-----------------------------------------------------------|-------------|------------------------------|

| Generate complete evidence packets for all escalations | Agent-owned | Handoff completeness rate |

| Reduce missing-context escalations | Shared | Incomplete handoff rate |

| Increase approved runbook updates from incident learnings | Shared | Approved updates per quarter |

| Maintain audit completeness above target | Shared | Audit field coverage rate |

Objective 3: Safely graduate incident classes into higher autonomy

| Key Result | Ownership | Metric |

|------------------------------------------------------------------|-------------------------|-------------------------------------------------------|

| Classify all recurring incident types into risk tiers | Human-owned | Percent classified |

| Graduate eligible incident classes after successful pilot period | Shared | Number of classes graduated from T1→T2 or shadow→live |

| Review all blocked or near-miss actions weekly | Human-owned | Review completion rate |

| Keep delegation-creep index within threshold | Human-owned (guardrail) | Unauthorized autonomy expansion count |

| Maintain rollback readiness for all autonomous actions | Shared | Rollback coverage rate |

Objective 3 is the maturation mechanism. Without it, the squad either freezes at its initial autonomy level (wasting agent capacity) or drifts into ungoverned expansion (delegation creep). The graduation KR makes the expansion explicit, measurable, and reversible.

7. NOC Team Topologies

The three topologies from Paper 1 map to the NOC as follows. The recommended adoption path is B → A → C.

noc fig2

Figure 2 · NOC team topologies. Start at B (highlighted). Graduate mature incident classes into A. Reserve C for large-scale operations with 100+ agents. If quality degrades, contract back to the prior topology.

8. Operating Cadence

The NOC squad operates at five tempos. Agents run in sub-second loops; humans work in minutes-to-weeks cycles. The cadence aligns them without forcing either into the other's rhythm.

|---------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|

| Weekly sovereignty review | 45 min/week | OKR progress, manual drill results, override quality, near-miss analysis, blocked-action review, delegation-creep index, incident-class graduation decisions | Policy updates, scope decisions |

| Adversarial drill | Bi-weekly | Inject synthetic faults into digital twin (e.g., OSPF cost flapping, route storms); verify agents respect boundaries and Sentinel blocks unauthorized payloads | Updated controls, escalation rules |

The sovereignty review is the ritual most traditional NOCs will not have seen. Its three standing questions — What did the agents not consider? Would we have reached a different conclusion without them? What is the cost if they were wrong? — are designed to detect the exact moment when human judgment begins drifting toward passive approval. If the shift lead cannot answer the first question with specific examples from the week's incidents, the oversight layer is degrading.

9. Unified NOC Dashboard

The dashboard is the primary defense against the fluency illusion. It forces the shift lead to evaluate intermediate reasoning, confidence baselines, and evidence quality — not polished summaries. Human and agent performance are tracked side-by-side.

noc fig3

Figure 3 · Unified NOC dashboard. Human and agent performance side-by-side, decision-point attribution bar with delegation-creep warning, sovereignty telemetry (override rate bounded to 15–30%, confidence calibration, near-misses, review tax), and the live incident queue with owner and Sentinel gate status.

The decision-point attribution bar is the delegation-creep detector. A rising "autonomous" share — here flagged at +6 points versus last week — triggers a review in the sovereignty audit. If the increase corresponds to an explicit governance decision to graduate an incident class, it is expected. If it does not, the squad is drifting.

The override rate bounded zone (15–30%) is the fluency-illusion defense. If the rate drops toward zero, SREs are rubber-stamping. If it spikes above the band, agents are not trusted. Both are signals for the weekly review.

10. The I-PASS Handoff Contract

When an agent hits its confidence threshold or the policy firewall blocks an execution, the task must transition to a human SRE. Research on structured handoff protocols in safety-critical environments demonstrates that I-PASS offers stronger evidence for error reduction than alternatives. The NOC adapts I-PASS as follows:

noc fig4

Figure 4 · Populated I-PASS escalation packet. Every field carries operational data — not a summary. The Sentinel block reason is explicit, and the receiver must acknowledge with a read-back.

The full YAML schema for programmatic I-PASS packets — including per-hypothesis confidence, contradicting evidence, unresolved questions, and rollback references — is provided in Appendix A.1.

11. Self-Interrogation Before Remediation

Before a high-impact recommendation or state-changing action at T2 or above, the agent must answer a structured pre-action checklist. This is the programmatic implementation of intellectual sovereignty — applied at the agent level rather than only the human-retro level.

What is the primary hypothesis?
What alternative hypotheses were considered?
What evidence supports the recommendation?
What evidence contradicts it?
What data is missing or stale?
What service or customer impact could occur if the action is wrong?
What policy boundary is this action near?
What is the blast radius?
Is the action reversible? What rollback has been verified?
What signal would prove the action failed?
Why should a human trust this recommendation?
Why might a human reject it?

The answers are logged as part of the audit record and included in the I-PASS packet if the action escalates. This is the structural defense against the governance principle that autonomous systems must "surface their own fluency flaws" rather than hiding mechanics behind polished output.

12. Pilot Roadmap

noc fig5

Figure 5 · NOC pilot roadmap. Phase 0 selects one bounded incident class (with an explicit "avoid" list). Graduation requires stable performance, low rework, complete audit trail, and explicit governance approval.

Phase 0 — Select Incident Class. Choose one bounded workflow: alarm suppression, ticket enrichment, low-risk edge service restart, runbook recommendation, or post-incident RCA draft. Explicitly avoid starting with: core routing, firewall policy changes, security incidents, VIP or regulated services, or irreversible changes.

Phase 1 — Baseline (Weeks 1–2). Measure current alert volume, duplicate rate, MTTA, MTTR, escalation rate, handoff completeness, rework rate, and human time spent on toil for the selected class.

Phase 2 — Shadow Mode (Weeks 3–4). Agents run on mirrored or historical incidents. Validate correlation accuracy, RCA quality, escalation appropriateness, policy evaluation, handoff packet completeness, and rollback logic. Humans execute all remediation manually.

Phase 3 — Limited Production (Weeks 5–6). Agents execute T0 and approved T1 actions. T2+ requires human approval. Daily review, weekly sovereignty audit. Immediate rollback if guardrails fail.

Phase 4 — Graduation (Week 7+). An incident class graduates to standing autonomy only after: stable performance over the pilot window, low rework rate, complete audit trail, reliable rollback, acceptable review tax, human trust, no delegation creep, and explicit governance approval. Graduation is a named decision, not a drift.

13. Rollback and Freeze Triggers

Freeze or contract agent scope in the NOC if any of the following occur:

Severe policy breach or unauthorized state-changing action
Audit logs incomplete for any executed remediation
Rollback path fails under test
MTTR improves but incident quality degrades (speed masking errors)
Escalation quality drops (incomplete handoffs or wrong routing)
Human reviewers cannot explain agent reasoning when asked
Override rate drops suspiciously near zero (rubber-stamping)
Override rate spikes without explanation (agent distrust)
Review tax exceeds operational benefit (net productivity goes negative)
Field or customer impact worsens after agent-executed remediation
Agent supervisor routing becomes opaque or unloggable
Autonomy expands without explicit governance approval

A rollback is not a pilot failure. It is evidence that the control system is working.

14. Evidence Boundaries

This case study should be presented with the following calibration.

Source-grounded elements include: NOC tier structures and role definitions, I-PASS handoff protocol evidence, multi-agent orchestration patterns (orchestrator-worker as ~70% of production deployments), the SupervisorAgent token-reduction finding (29.68% on GAIA, arXiv:2510.26585), early production data from telecom NOC AI deployments (~60% of operations AI-assisted at one major carrier, Microsoft Tech Community), and the cognitive-trap framework with METR's empirical signature (arXiv:2507.09089).

Synthesized elements include: exact cadence timing, staffing ratios, OKR target thresholds, override-rate bands, confidence thresholds, pilot phase durations, graduation criteria, and the self-interrogation protocol. These should be treated as starting assumptions calibrated by the evidence, not as proven benchmarks.

The honest framing for this case study is: "Hybrid human-agent NOCs are an emerging operating model. This case study proposes a structured design grounded in NOC practice, AI governance principles, multi-agent orchestration patterns, and the cognitive-partnership framework. It should be validated through bounded pilots before production scaling."

Appendix A: NOC Operational Reference

A.1 I-PASS Handoff YAML Schema (NOC)

noc_handoff_packet:

incident_id: <string>

timestamp_utc: <iso8601>

severity: <low | medium | high | critical>

affected_service: <string>

affected_domain: <string>

affected_assets: [<string>]

current_owner: <human | agent | shared>

escalation_reason:

<low_confidence | high_impact | policy_threshold |

novel_pattern | failed_remediation | missing_data | human_requested>

summary: <string>

hypotheses:

hypothesis: <string>

confidence: <0.0..1.0>

supporting_evidence: [<string>]

contradicting_evidence: [<string>]

actions_taken:

actor: <human | agent>

action: <string>

timestamp_utc: <iso8601>

result: <string>

telemetry_sources:

source: <string>

timestamp_utc: <iso8601>

freshness: <fresh | stale | unknown>

policy_evaluation:

policy_version: <string>

risk_tier: <T0 | T1 | T2 | T3 | T4>

decision: <allow | block | escalate | freeze>

rationale: <string>

recommended_next_actions:

action: <string>

expected_effect: <string>

risk: <string>

rollback: <string>

unresolved_questions: [<string>]

required_human_decision: <string>

receiver_readback_required: true

A.2 NOC Audit Schema

| Field | Purpose |

|-----------------------|----------------------------------------------|

| event_id | Unique trace identifier |

| timestamp_utc | Incident timeline reconstruction |

| incident_id | System-of-record link |

| actor_type | Human, agent, or supervisor agent |

| actor_id | Accountable identity |

| agent_session_id | Traceable agent run |

| action_type | Query, recommend, approve, execute, rollback |

| target_system | Affected OSS, NMS, controller, or service |

| affected_service | Business or technical service impacted |

| risk_tier | T0–T4 authority class |

| policy_version | Rule set applied at action time |

| policy_decision | Allow, block, escalate, or freeze |

| confidence_score | Agent confidence at action time |

| data_sources_accessed | Telemetry and knowledge provenance |

| topology_context | Affected dependencies and blast radius |

| approval_required | Whether human approval was needed |

| approver_id | Human approver identity |

| approval_outcome | Approved, rejected, or modified |

| execution_status | Succeeded, failed, escalated, or rolled back |

| rollback_reference | Rollback evidence and path |

| validation_result | Post-action verification outcome |

| override_reason | Human rationale when overriding agent |

| postmortem_link | Learning artifact reference |

A.3 Source Reference Table

| Claim | Primary Source |

|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| METR developer RCT: 19% slower, subjective +20% estimate | arXiv:2507.09089 |

| SupervisorAgent 29.68% token reduction on GAIA | arXiv:2510.26585 |

| ~60% NOC operations AI-assisted at major carrier | Microsoft Tech Community (NOA Framework) |

| I-PASS: moderate-certainty evidence for handoff error reduction | PMC Systematic Review, 2025 |

| Multi-agent orchestrator-worker as ~70% of production deployments | Databricks · IBM · Microsoft Azure |

| Co-Gym: collaborative agents outperform autonomous | arXiv:2412.15701 |

| Workslop Tax: ~40% of AI time savings lost to rework | Workday, Jan 2026 |

| NIST AI Risk Management Framework | NIST AI RMF |

This case study applies it to the NOC environment. For generic enterprise abstractions, read generic hybrid-team blueprint