Architecture

Tool Guard Core is a Policy Decision Point (PDP) for AI agents. Send it a tool call (via POST /evaluate or the in-process Go library) and it returns allow, deny, or escalate - plus the rule, reason, and a tamper-evident, hash-chained audit record. Core decides and records; your tool-execution layer enforces.

This document describes how the pieces fit together.

High-level flow


   ┌─────────┐   1. evaluate   ┌──────────┐
   │  agent  │ ──────────────▶ │ tg-proxy │
   │  (LLM)  │ ◀────────────── │  (PDP)   │
   └────┬────┘   2. decision   └────┬─────┘
        │                           │ 3. every decision
        │ 4. execute                ▼
        │    (if allowed)  ┌─────────────────┐
        ▼                  │   audit chain   │
   ┌─────────┐             │ (JSONL + SHA256)│
   │  tool   │             └─────────────────┘
   └─────────┘

The agent is anything that emits structured tool calls: an MCP server, a LangChain executor, an AutoGen runtime, an Anthropic / OpenAI tool-use loop, or a hand-coded Go program. tg-proxy acts as the Policy Decision Point (PDP) - the only interface is POST /evaluate with a JSON ActionEnvelope.

Packages


pkg/domain/      Tool-call envelopes, policy / rule / condition types,
                 decision traces. JSON-tagged; YAML loaders in cmd/tg.

pkg/engine/      Pure policy evaluation. No I/O. Given an envelope and
                 a set of policies, returns a decision in microseconds.
                 Hosts the path_classify / shell_classify predicates and
                 the regex compile cache.

pkg/sqlguard/    Four-dialect SQL classifier (postgres / mysql / sqlite
                 / mssql). The tokenizer-based lite implementation is
                 pure-Go and registered by default. Strict variants
                 (pg_query_go / tidb/parser / rqlite/sql) are opt-in
                 via build tags and override the same dialect through
                 the sqlguard registry.

pkg/llmguard/    Local-LLM content classifier. Pure HTTP client against
                 Ollama; multimodal (text + image) via Gemma 4 vision
                 variants. Used by the llm_classify condition for
                 image / audio / text generation tool surfaces.
                 Fail-closed semantics throughout.

pkg/audit/       SHA-256 hash-chained traces, offline replay verifier,
                 canonical JSON for stable hashing. The canonical
                 encoder covers the decision and the fields that produce
                 it - identity, tool, amount, decision/action/reason,
                 mode, the matched rule results, escalation target,
                 chain links, and signer - so tampering with a hashed
                 field is detected. Operator annotations and
                 post-decision metadata (citations, suggested response,
                 escalation-resolution fields, redacted parameters,
                 context snapshot, token/cost counters, and diagnostic
                 fields) are recorded but not hashed; the exact hashed
                 set is `canonicalTraceV1` in `pkg/audit/canonical.go`.
                 Escalation approvals are written as their own chained
                 entries.

cmd/tg/          The one-shot CLI: evaluate / verify / lint / benchmark.

cmd/tg-proxy/    HTTP service: POST /evaluate, hash-chained JSONL
                 audit with rotation, SIGHUP policy reload, /metrics,
                 /healthz, /readyz, per-agent rate limiting, escalation
                 store, unknown-tools-deny gate.

cmd/battle-test/ Adversarial harness - drives a local LLM (Gemma 4
                 today, Qwen 3.x once integrated) against the engine.

examples/        Five self-contained policy bundles, each with its own
                 policies, mock tools, test script, and README.

The evaluation pipeline

For every /evaluate request the proxy walks this sequence:

1. Body cap & JSON depth - the request body is capped at 1 MiB (via http.MaxBytesReader - silent truncation is not allowed); the envelope's JSON nesting is bounded at -max-envelope-depth (default 32). 2. Rate limit - if -rate-limit-rps > 0, the envelope's configured key field (default agent_id) is checked against a per-key token bucket. Empty keys collapse to _unknown so a hostile envelope with no agent_id cannot bypass the limit. 3. Fail-closed gate - if no policies are loaded and -fail-closed=true, the call is denied with a boundary-deny trace appended to the audit chain. 4. Engine evaluation - every loaded policy whose scope matches the envelope contributes its rules to the evaluation. Each rule walks its condition tree (and / or / not plus leaf comparisons or one of the classifiers: sql_classify, path_classify, shell_classify, llm_classify). 5. Effect resolution - among all rules that fired, the strongest effect wins by severity hierarchy (deny > escalate > flag > allow). 6. Unknown-tools-deny gate - if -unknown-tools-deny is set and the envelope's tool_name is not in any enforcement policy's scope.tool_names, the decision is forced to denied. 7. Escalation - if the decision is escalated, a pending entry is registered in the bounded escalation store. The agent gets a poll_url back and can long-poll for the operator's decision. 8. Audit append - the full decision trace is canonical-encoded, SHA-256-hashed, linked to the previous trace, and written to the JSONL log. lastHash advances BEFORE the durability barrier so a Sync failure cannot fork the chain. 9. Response - JSON EvaluationResult with decision, reason, matched rules, citations, escalation poll URL (if applicable).

The condition DSL

Conditions are recursive trees. The four leaf shapes are:


# Leaf: simple field comparison
conditions:
  field: amount
  operator: gt
  value: 500

# Classifier leaves
conditions:
  sql_classify: ...
  path_classify: ...
  shell_classify: ...
  llm_classify: ...

# Tree shapes
conditions:
  and: [ {...}, {...} ]
  or:  [ {...}, {...} ]
  not: {...}

See creating-policies.md for every operator and classifier with examples.

The audit chain

Every trace's hash covers the canonical JSON of the decision and the fields that produce it - decision_reason, the matched-rule list, the agent identity, the amount, the chain links, and the signer - so mutating any of them breaks tg verify. Operator annotations and post-decision metadata (citations, suggested response, the escalation-resolution fields, parameters_redacted, context_snapshot, the token/cost counters, and diagnostic fields) are recorded in the log but are not part of the canonical hash; the exact hashed set is defined by canonicalTraceV1 (and its nested canonicalRuleResultV1 / canonicalDeepEvalV1) in pkg/audit/canonical.go. Escalation approvals are written as their own chained entries, so the approval record is itself tamper-evident.

On startup, the proxy reads the audit log tail, recomputes its canonical hash, and refuses to start if the stored hash doesn't match (tamper-on-disk detection).

Rotation is opt-in via -audit-rotate-bytes. Three fsync modes: every (default, strongest durability), interval (per N appends), none (OS-managed). tg verify walks the rotation set in order.

The escalation flow

Rules with effect: escalate register a pending entry in the bounded escalation store. The store is in-memory; a proxy restart discards pending entries (the agent's poll returns 404, treated as expired).

The store is hard-capped at 10,000 entries. Eviction prefers the oldest resolved (approved/denied) entry. When the store is full of pending entries, new escalations are downgraded to deny with an explicit reason - refusing to silently drop a pending entry an operator might be polling for.

The approver endpoints (POST /escalations//approve and /escalations//deny) require a static bearer token configured via -approver-token. The token compare is SHA-256-of-both before constant-time compare, so token LENGTH is not leaked via timing. Approve / deny state transitions write a linked audit trace.

Rate limiting

A token-bucket per agent (or session / org, configurable via -rate-limit-key-by). The bucket map is capped at 100k entries with 30-min idle eviction. Empty keys collapse to a single shared _unknown bucket - refusing to exempt envelopes with missing identity from rate limiting.

The LLM classifier (multimodal)

pkg/llmguard implements a generic Ollama-served content classifier. Policies use it via the llm_classify condition:


conditions:
  llm_classify:
    prompt_field: parameters.prompt
    image_url_field: parameters.source_image_url  # optional
    model: gemma4:e4b
    timeout_seconds: 30
    forbidden:
      - weapons
      - csam
      - real_person_likeness

The classifier is framed as a routing task (not a "content safety" task) so the underlying Gemma's own safety filter doesn't refuse to engage. Empty model responses are interpreted as model_refused which fires deny (fail-closed). User prompts are wrapped in unguessable delimiters so a prompt-injected "ignore previous instructions" pattern cannot reach the classifier's system prompt.

The image-fetch path is SSRF-hardened: dial-time private-CIDR / loopback / link-local / CGNAT blocking, no redirects ever, scheme allowlist (http/https only), no userinfo in URLs.

See content-gen-bundle.md for the end-to-end demo with three policies (image / audio / text gen) and 16 deterministic E2E assertions against real Gemma 4 e4b.

Scope boundary

The engine deliberately covers every structured-tool surface (SQL, shell, path, monetary, customer data, mass comms) AND the multimodal-gen surface via a single-model local Gemma classifier. Multi-model ensemble arbitration and voice-print matching do not exist in any edition; PII redaction and compliance evidence packs ship in the commercial Tool Guard Enterprise platform, not here - oss-vs-enterprise.md draws the full boundary.