Integration Guide

Tool Guard Core ships in two shapes: a runtime HTTP service (tg-proxy) and a Go library (pkg/engine, pkg/audit, pkg/domain). Pick the shape that matches how your agent is built; the underlying engine is identical and produces the same audit traces.

  • Runtime service - your agent talks to `tg-proxy` over HTTP.

Easiest to drop in, language-agnostic, lets you scale Tool Guard independently of your agent. This is what examples/sample-app/ demonstrates.

  • Embedded library - your Go agent calls `engine.Evaluator.Evaluate`

directly. Zero network hop, you own the lifecycle. Best when the agent is already a Go service.

This guide covers both, plus how to wire the proxy into MCP servers, LangChain, AutoGen, and the Anthropic / OpenAI tool-use loops.

---

1. Run `tg-proxy` next to your agent

The proxy is a single binary with zero external dependencies. It loads YAML policies from a directory, exposes POST /evaluate, and appends every decision to a JSONL audit log.

1.1 Minimal local run


make build       # produces ./bin/tg-proxy
./bin/tg-proxy \
  -listen :9090 \
  -policy-dir ./policies \
  -audit-log ./decisions.jsonl

Available flags:

| Flag | Default | Meaning | |---|---|---| | -listen | :9090 | host:port to bind | | -policy-dir | ./policies | directory of *.yaml to load on startup and on SIGHUP | | -audit-log | ./decisions.jsonl | path to append the SHA-256 hash-chained JSONL | | -default-mode | enforcement | shadow for observe-only | | -fail-closed | true | return 503 from /readyz and from /evaluate when zero policies are loaded |

Endpoints:

| Method | Path | Purpose | |---|---|---| | POST | /evaluate | body: ActionEnvelope JSON; returns EvaluationResult JSON | | GET | /healthz | liveness; 200 OK if process is alive | | GET | /readyz | readiness; 200 OK if at least one policy is loaded | | GET | /policies | list the policies currently loaded (debugging) | | GET | /metrics | plain-text counters; scrape-friendly | | POST | /reload | trigger a policy reload without restarting (also fires on SIGHUP) |

1.2 Make a request from any language


curl -X POST http://localhost:9090/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "support-agent-v2",
    "session_id": "sess-001",
    "org_id": "acme",
    "tool_name": "issue_refund",
    "tool_group": "monetary_outflow",
    "parameters": {"amount": 1000, "reason": "Goodwill credit"}
  }'

Response (HTTP 200):


{
  "decision": "denied",
  "action_taken": "denied",
  "decision_reason": "Denied by: [rule-amount-cap] ...",
  "effective_mode": "enforcement",
  "policies_matched": 2,
  "rules_evaluated": 3,
  "rules_triggered": 2,
  "rule_results": [{ "...": "..." }],
  "primary_citation": { "...": "..." }
}

In your agent code: call the tool when action_taken is allowed (enforcement), allowed_shadow (shadow), or flagged. A flag effect is a recorded near-miss - the call still proceeds, but the decision is logged for review. denied means do not call the tool; escalated means wait for human approval (see escalation.md) before calling it.

1.3 Systemd unit (Linux)


# /etc/systemd/system/tg-proxy.service
[Unit]
Description=Tool Guard Core HTTP proxy
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=tgproxy
Group=tgproxy
ExecStart=/usr/local/bin/tg-proxy \
  -listen 127.0.0.1:9090 \
  -policy-dir /etc/tg-proxy/policies \
  -audit-log /var/lib/tg-proxy/decisions.jsonl \
  -default-mode enforcement \
  -fail-closed=true
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=2s
NoNewPrivileges=true
ProtectSystem=strict
ReadWritePaths=/var/lib/tg-proxy

[Install]
WantedBy=multi-user.target

Reload a new policy set without dropping requests:


sudo systemctl reload tg-proxy

1.4 Kubernetes (minimal)


apiVersion: apps/v1
kind: Deployment
metadata:
  name: tg-proxy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: tg-proxy
  template:
    metadata:
      labels:
        app: tg-proxy
    spec:
      containers:
        - name: tg-proxy
          image: ghcr.io/dimaggi-ai/tool-guard-core:v0.1.0
          args:
            - -listen=:9090
            - -policy-dir=/etc/tg/policies
            - -audit-log=/var/lib/tg/decisions.jsonl
            - -fail-closed=true
          ports:
            - containerPort: 9090
          livenessProbe:
            httpGet: { path: /healthz, port: 9090 }
          readinessProbe:
            httpGet: { path: /readyz, port: 9090 }
          volumeMounts:
            - { name: policies, mountPath: /etc/tg/policies, readOnly: true }
            - { name: audit, mountPath: /var/lib/tg }
      volumes:
        - name: policies
          configMap: { name: tg-policies }
        - name: audit
          persistentVolumeClaim: { claimName: tg-audit }
---
apiVersion: v1
kind: Service
metadata:
  name: tg-proxy
spec:
  selector: { app: tg-proxy }
  ports:
    - port: 9090
      targetPort: 9090

The audit log is the source of truth for tg verify. Use a PVC for durability across pod restarts, and rotate the log on a schedule that matches your evidence-pack windows (per-day or per-week is typical).

> Image note: tagged releases publish multi-arch images to > ghcr.io/dimaggi-ai/tool-guard-core via the GoReleaser pipeline in > this repo, so the manifest above works as-is once a release is > tagged. To run from source before a tagged release exists, build and > push your own image with the minimal Dockerfile > below and override image:.

---

2. Embed the library in a Go service

If your agent is itself a Go service, skip the HTTP hop:


package main

import (
    "context"
    "encoding/json"
    "os"

    "github.com/dimaggi-ai/tool-guard-core/pkg/audit"
    "github.com/dimaggi-ai/tool-guard-core/pkg/domain"
    "github.com/dimaggi-ai/tool-guard-core/pkg/engine"
)

// Guard wraps the engine and an append-only audit log writer.
// Construct it once at startup and reuse across requests; it is safe
// for concurrent use.
type Guard struct {
    eval     *engine.Evaluator
    policies []domain.Policy
    auditFile *os.File
    auditEnc  *json.Encoder
    lastHash string
}

func NewGuard(policies []domain.Policy, logPath string) (*Guard, error) {
    f, err := os.OpenFile(logPath, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
    if err != nil {
        return nil, err
    }
    return &Guard{
        eval: engine.NewEvaluator(),
        policies: policies,
        auditFile: f,
        auditEnc: json.NewEncoder(f),
    }, nil
}

func (g *Guard) Check(ctx context.Context, env *domain.ActionEnvelope) (bool, *domain.EvaluationResult) {
    result := g.eval.Evaluate(env, g.policies, domain.PolicyModeEnforcement)
    // Append to the audit chain. Use the CANONICAL hash
    // (ComputeCanonicalTraceHash) - it covers the decision and the
    // fields that produce it (the exact set is canonicalTraceV1 in
    // pkg/audit/canonical.go) and is what `tg verify` recomputes. The
    // legacy ComputeTraceHash covers only six identity fields and will
    // not verify.
    trace := domain.DecisionTrace{
        TraceID:           "trc-" + env.EnvelopeID,
        EnvelopeID:        env.EnvelopeID,
        Timestamp:         env.Timestamp,
        OrgID:             env.OrgID,
        AgentID:           env.AgentID,
        SessionID:         env.SessionID,
        ToolName:          env.ToolName,
        ToolGroup:         env.ToolGroup,
        Decision:          result.Decision,
        ActionTaken:       result.ActionTaken,
        DecisionReason:    result.DecisionReason,
        Mode:              result.EffectiveMode,
        PreviousTraceHash: g.lastHash,
    }
    hash, err := audit.ComputeCanonicalTraceHash(&trace)
    if err != nil {
        return false, nil // fail closed on audit errors
    }
    trace.TraceHash = hash
    _ = g.auditEnc.Encode(trace)
    g.lastHash = trace.TraceHash

    // A `flag` effect is a recorded near-miss - the call still proceeds.
    return result.ActionTaken == domain.ActionAllowed ||
           result.ActionTaken == domain.ActionAllowedShadow ||
           result.ActionTaken == domain.ActionFlagged, result
}

Then in your tool-call path:


ok, result := guard.Check(ctx, envelope)
if !ok {
    // Don't execute the tool. Return result.SuggestedResponse to the
    // model so it knows what to say to the user.
    return result.SuggestedResponse, nil
}
// Allowed - call the real tool.
return realTool.Execute(ctx, envelope)

Same engine, same audit chain, no HTTP hop. Best for high-throughput agents where every millisecond matters.

---

3. Wire it into your agent framework

The integration pattern is always: before executing a tool call, call the policy decision point. Only execute on allowed.

3.1 MCP (Model Context Protocol)

If your MCP server wraps other tools, intercept every CallTool request:


// Pseudo-Go inside an MCP CallTool handler.
func (s *Server) CallTool(ctx context.Context, req CallToolRequest) (CallToolResult, error) {
    env := &domain.ActionEnvelope{
        AgentID:    s.agentID,
        SessionID:  req.SessionID,
        OrgID:      s.orgID,
        ToolName:   req.Name,
        ToolGroup:  s.toolGroups[req.Name],
        Parameters: req.Arguments,
        Timestamp:  time.Now().UTC(),
        EnvelopeID: uuid.NewString(),
    }
    ok, result := guard.Check(ctx, env)
    if !ok {
        return CallToolResult{
            IsError: true,
            Content: []TextContent{{Type: "text", Text: result.SuggestedResponse}},
        }, nil
    }
    return s.upstreamTools[req.Name].Call(ctx, req)
}

3.2 LangChain (Python via HTTP)

Use tg-proxy as a callback on every tool invocation:


import json
import requests
from langchain.callbacks.base import BaseCallbackHandler

PROXY = "http://localhost:9090"

class ToolGuardCallback(BaseCallbackHandler):
    def __init__(self, agent_id, org_id, session_id):
        self.agent_id = agent_id
        self.org_id = org_id
        self.session_id = session_id

    def on_tool_start(self, serialized, input_str, **kwargs):
        # LangChain hands you the raw tool input as a string; parse it
        # into the structured envelope tg-proxy expects.
        envelope = {
            "agent_id": self.agent_id,
            "org_id": self.org_id,
            "session_id": self.session_id,
            "tool_name": serialized["name"],
            "tool_group": "general",
            "parameters": _parse(input_str),
        }
        r = requests.post(f"{PROXY}/evaluate", json=envelope, timeout=2.0)
        r.raise_for_status()
        decision = r.json()
        # Block only on deny/escalate; allowed, allowed_shadow, and
        # flagged (a recorded near-miss) proceed.
        if decision["action_taken"] in ("denied", "escalated"):
            # Short-circuit. LangChain doesn't have a clean "block this
            # tool" hook; raising stops the chain.
            raise PermissionError(decision.get("suggested_response") or decision["decision_reason"])

3.3 Microsoft AutoGen

AutoGen exposes a register_for_execution callback. Insert the proxy check before delegating to the real function:


import requests

def guarded(name: str, group: str, fn):
    def wrapper(**params):
        envelope = {
            "agent_id": "autogen-agent",
            "session_id": "...",
            "org_id": "...",
            "tool_name": name,
            "tool_group": group,
            "parameters": params,
        }
        r = requests.post("http://localhost:9090/evaluate", json=envelope, timeout=2.0)
        d = r.json()
        if d["action_taken"] in ("denied", "escalated"):
            return {"error": d.get("suggested_response") or d["decision_reason"]}
        return fn(**params)
    return wrapper

assistant.register_for_execution(name="issue_refund")(guarded("issue_refund", "monetary_outflow", real_refund))

3.4 Anthropic / OpenAI native tool use

When you receive a tool_use block from the model, do not execute it yet. Call /evaluate first. Then either run the tool and return the result, or short-circuit and return tool_result with is_error=true and the suggested response so the model adapts.


def handle_tool_call(tool_use, session):
    envelope = build_envelope(tool_use, session)
    r = requests.post("http://localhost:9090/evaluate", json=envelope, timeout=2.0).json()
    if r["action_taken"] in ("denied", "escalated"):
        return {
            "type": "tool_result",
            "tool_use_id": tool_use["id"],
            "is_error": True,
            "content": r.get("suggested_response") or r["decision_reason"],
        }
    output = TOOLS[tool_use["name"]](**tool_use["input"])
    return {"type": "tool_result", "tool_use_id": tool_use["id"], "content": output}

---

4. Audit storage choices

tg-proxy writes JSONL to disk by default. Treat that file as append-only and back it up like any other ledger. Common patterns:

  • Local file + nightly upload to object storage. Simplest. Rotate

daily; the rotated file is a self-contained chain you hand to your auditor.

  • Sidecar log shipper (fluent-bit / vector). Tail

decisions.jsonl to your existing logging pipeline. Verify chains on the receiving end with tg verify.

  • Database (Postgres / DynamoDB / etc.). Skip the proxy's file

output; embed the library directly (Section 2) and write each trace to your DB in Guard.Check. The hash-chain link still holds because it's just a string field on each row.

Whichever you pick, the rule is: do not let tg verify fail. If hand-edited rows break the chain, your evidence pack stops being tamper-evident.

---

5. Operational notes

  • Mode policy. The call-site mode is a floor that callers raise

(the tg CLI's -mode, the proxy's -default-mode); the engine takes the strictest of (call-site mode, every matched policy's mode). A policy marked mode: enforcement in YAML cannot be downgraded to shadow by either flag. An integration test covers the strictest-mode resolution.

  • Latency. Deterministic evaluation is in-process and p99 ≈ 14µs

on commodity hardware (see tg benchmark). The proxy adds one HTTP hop plus JSON marshal/unmarshal; expect sub-millisecond round trips on a local Unix socket and 1–3 ms across a Kubernetes pod.

  • Failure mode. Run with `-fail-closed=true` (the default). On

policy load failure the proxy refuses new requests; an upstream Envoy / NGINX can then route to a "blocked by policy" handler.

  • Hot reload. `SIGHUP` (or `POST /reload`) re-reads `-policy-dir`

atomically. In-flight requests finish under the old policies; the next request sees the new ones.

  • Concurrency. All endpoints are safe under concurrent load. The

audit-log writer is serialised by a mutex so hash-chain links cannot interleave; expect ~tens of thousands of evaluations per second per CPU before that serialisation becomes the bottleneck.

---

6. Verification

After every deploy:


# Was the policy directory loaded?
curl -sf http://localhost:9090/readyz

# Are decisions being recorded?
tail -f /var/lib/tg-proxy/decisions.jsonl

# Is the chain intact?
./bin/tg verify -file /var/lib/tg-proxy/decisions.jsonl

# Score a known-bad envelope to confirm the proxy denies it.
# (examples/call_over_cap.json ships in the repo: a $1000 refund
#  against the $500 cap policy.)
curl -X POST http://localhost:9090/evaluate \
  -H "Content-Type: application/json" \
  -d @examples/call_over_cap.json | jq .decision
# expected: "denied"

---

Minimal Dockerfile


# syntax=docker/dockerfile:1.6
FROM golang:1.25-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -o /out/tg-proxy ./cmd/tg-proxy
RUN CGO_ENABLED=0 go build -trimpath -o /out/tg ./cmd/tg

FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/tg-proxy /usr/local/bin/tg-proxy
COPY --from=build /out/tg /usr/local/bin/tg
USER nonroot:nonroot
EXPOSE 9090
ENTRYPOINT ["/usr/local/bin/tg-proxy"]
CMD ["-listen=:9090"]

Build with docker build -t tg-proxy:dev . and run with docker run -p 9090:9090 -v $PWD/policies:/policies tg-proxy:dev -policy-dir=/policies.