Engineering

The Execution Gate. How we hold actions pre-execution.

The Execution Gate is the deterministic enforcement primitive at the heart of Vigil's policy plane. It produces Vigil's strongest architectural property: no LLM, no statistical model, and no probabilistic component sits between an agent's action and the decision about whether that action proceeds.

If you have read the post on why prompt injection is a statistical problem, this is the implementation side of that argument. The two-surface decomposition is the design. The Execution Gate is the code.

An enforcement layer that uses an LLM to decide whether actions are safe is an enforcement layer that can itself be prompt-injected. That is the property the Gate is designed around.

What an action is, in our model

Before describing the Gate, it helps to fix what an action is. In Vigil's model, the agent's response is decomposed into a sequence of declared actions, each with a type, a target, and parameters. Examples:

  • email.send(to=alice@example.com, subject=…, body=…)
  • payment.transfer(amount=500, currency=USD, to=acct_123)
  • file.modify(path=/home/user/budget.xlsx, region=B2:B17, value=…)
  • api.invoke(endpoint=https://salesforce.com/…, method=POST, payload=…)

The decomposition runs in the analysis plane, before the action reaches the policy plane. Every action carries a structured representation, not free text. The policy engine evaluates the structured representation. It never reads the underlying prompt or the model's natural-language framing.

This matters because the enforcement question is not "does the model want to do something safe." It is "is this specific action, with these specific parameters, within the user's authorized capability bounds, given the user's current authorization state and the policy rules in force." The first question requires understanding intent. The second question is a Boolean operation against a structured input.

The Gate's three modes

The Execution Gate operates in one of three modes for any given action.

Mode one: pass-through

The action falls within unconditionally authorized capability. The user has previously authorized this action type, this target class, and this parameter envelope, and the agent has not been tier-escalated by the detection ensemble. The Gate emits the action to the agent's downstream surface in under one millisecond. There is no user interruption. No alert. No friction.

Pass-through is the default for low-consequence actions. Reading public web content. Opening a file the user themselves wrote. Running a search query against the user's authorized search tools. The Gate's job for these is to be fast and invisible. The majority of agent actions in normal operation fall in pass-through.

Mode two: hold

The action is outside unconditionally authorized capability, or the detection ensemble has flagged the session, or the user has previously specified that this action type requires confirmation. The Gate holds the action pre-execution. The agent receives a structured response indicating the hold. The user receives a notification with the action's structured form, the policy rules that triggered the hold, and a confirmation prompt.

The hold is reversible. If the user confirms, the action proceeds and the Gate records the confirmation in the audit chain. If the user denies, the action is canceled and the cancellation is recorded. If the user does not respond within a configurable timeout, the action is canceled by default.

The latency budget for the Gate's portion of the hold path is under fifty milliseconds. The user's response time is whatever it is, but the Gate itself does not contribute to the perceived delay beyond fifty milliseconds.

The hold is the default for medium-consequence actions. Sending external email. Modifying files outside the agent's working directory. Calling APIs that produce financial or material effects. A meaningful minority of agent actions fall in hold under default policy.

Mode three: block

The action falls within unconditionally prohibited capability, regardless of confirmation. Examples include actions that would produce irreversible effects above a configured magnitude (large financial transfers, deletion of records flagged as protected, transmission of data classified above the agent's authorization level), or actions that would violate a policy invariant that no user-level confirmation is allowed to override.

Block is rare in normal operation. The actions that hit it are typically noisy ones early in a session that get rejected before the agent's behavior settles.

How the Gate evaluates

The evaluation is a deterministic function. The function takes four inputs:

  1. The structured action.
  2. The user's authorized capability set, which is a function of the user's TAP attestation chain at the time of the action.
  3. The current detection tier, which is a function of the four-model ensemble's combined signal across the session window.
  4. The policy rule set in force.

The function produces one output: a mode, which is one of pass, hold, or block.

The function is implemented in the policy crate. It is approximately five hundred lines of Rust, including comments, tests, and the type definitions. It compiles to a static dispatch table at startup. It does not call out to any model, any external service, or any non-deterministic primitive. The same inputs always produce the same output.

This is not an architectural preference. It is the entire point. A motivated attacker who has compromised the agent's prompt can shift the model's outputs. They cannot shift a Boolean comparison.

What is in the policy rule set

The policy rule set is a collection of rules, each of the form:

when: <action pattern matched against the structured action>
and: <conditional on user state, tier, parameters, time, etc.>
then: <one of pass, hold, block>
record: <fields to record in the audit chain>

A simple example rule:

when: action.type == "payment.transfer"
and: action.params.amount > user.config.payment_confirmation_threshold
then: hold
record: [amount, currency, destination, agent_id]

A composite rule:

when: action.type == "file.modify"
and: action.target.path matches user.config.protected_paths
or: detection.tier >= 2
then: hold
record: [path, region, agent_id, tier]

The rule set is configured per user, with sensible defaults. Enterprise users can layer organizational policy on top of personal policy, with documented precedence. The default rule set is deliberately conservative. Users can relax it.

The rule set is human-readable. This is also a deliberate property. A user, an enterprise security team, or a regulator should be able to read the rules and understand what conditions produce a hold or a block. We do not believe in opaque defense.

What is not in the Gate

The Gate does not classify intent. It does not score whether a prompt looks malicious. It does not evaluate the natural language framing of the agent's response. It does not call any LLM as part of its decision. It does not consult an external API to make its decision.

The Gate's job is narrow on purpose. The detection ensemble does the probabilistic work. The Gate consumes the ensemble's output as a tier signal, then evaluates the action against the policy. The two layers communicate through a single normalized signal, not through shared internal state.

This decomposition is what produces the security property. An attacker who shifts the detection ensemble's output can, at most, shift the tier signal. The tier signal is one input to the Gate's deterministic function. The function still evaluates the action against the user's authorized capabilities and the policy rules. The capabilities are signed by the TAP attestation chain; they cannot be forged by manipulating the prompt. The policy rules are configured out-of-band; they cannot be modified by the agent's session.

For the attacker to bypass the Gate, they have to either compromise the user's authorization state (TAP), modify the policy rule set (out-of-band channel), or find a flaw in the Gate's deterministic logic. None of these are reachable through prompt injection. All of them are properties of code paths that we can audit, test, and reproduce.

Latency budget, in detail

The Gate's latency target is sub-50ms on the prevention path, p99. We hit it consistently in production. The breakdown is approximately:

  • Action decomposition (analysis plane, upstream of the Gate): under 5ms.
  • Tier classification combine (analysis plane, parallel to decomposition): under 10ms.
  • Policy rule evaluation (Gate, deterministic): under 1ms typical, under 5ms for users with large rule sets.
  • Audit chain entry write (Vault plane, parallel to action emit): under 5ms.

The dominant term is action decomposition, which depends on the response length and the number of declared actions. The Gate's own latency is small. The sub-50ms budget exists to give the analysis plane room.

The pass-through path is faster. Sub-1ms is achievable when the policy rules cache a previous evaluation for the same action signature.

What this enables

A few specific properties follow from the Gate's architecture.

First, the kill switch is real. When VARP propagates a revocation, the Gate's policy rule set updates within a configurable window (default: under one second). All subsequent actions from the revoked agent are evaluated against the post-revocation rule set, which blocks them. The kill switch is not a hopeful UI element. It is a deterministic state change in the policy plane that propagates to all subsequent action evaluations.

Second, audit is structurally complete. Every action that reaches the Gate produces an audit chain entry, regardless of mode. Pass-through is recorded. Hold is recorded. Confirmation is recorded. Block is recorded. There is no path for an action to be processed by the Gate without leaving an entry. A regulator examining the chain six months later sees every action the agent attempted, with the policy decision and the underlying state at the time.

Third, the prompt is not in the trust path. Anything the model emits is data. The Gate does not trust the model's claims about why an action is appropriate. It evaluates the action against the policy. If the action passes, it passes regardless of the model's reasoning. If the action does not pass, no amount of model reasoning changes the outcome.

This is the architectural property we set out to produce. It is the reason we built the Gate as a separate plane with no cross-imports from the analysis plane. It is the reason we wrote it in Rust, with deterministic dispatch and no dynamic policy compilation. It is the reason it is small enough to read in an evening.

What you can do with this

If you are evaluating Vigil for an enterprise deployment, the Gate is the part of the architecture you should ask hardest about. We will walk you through the source. The rule language is documented. The policy rule sets are inspectable. The decision function is traceable for any given action.

If you are building a defense product yourself, this is the architectural pattern that survives. Probabilistic detection feeding deterministic enforcement on a different surface than the one being attacked. The implementation specifics will differ for your stack. The architecture should not.

The Gate is the smallest, most-tested, most-audited crate in our codebase. That is correct. It is the part of the system that, if it ever fails, fails into a configuration where the agent stops being able to act. We would rather it be small and right than large and clever.

← Back to The Vigil Journal