Let me walk you through something that’s been keeping me up at night in 2026: making sure our AI agents don’t go rogue. I’ve spent the last six months building and breaking compliance guardrails for enterprise agent systems, and I’ve got the scars to prove it. Here’s a practical, step-by-step guide to getting it right.
What You’ll Need Before We Start
Before we dive into the code, let’s get the prerequisites straight. I’ve found that skipping any of these leads to painful debugging sessions later.
| Requirement | Minimum Version | Purpose |
|---|---|---|
| Python | 3.12 | Core execution environment |
| LangChain | 0.3.5 | Agent orchestration framework |
| Guardrails AI | 0.5.2 | Policy enforcement layer |
| OpenAI API | 1.55+ | LLM backend |
| Redis | 7.4 | Audit log storage |
Install everything with one command:
pip install langchain==0.3.5 guardrails-ai==0.5.2 openai==1.55.0 redis==5.2.0
Step 1: Define Your Compliance Policies as Code
In my experience, the biggest mistake teams make is writing compliance rules in natural language documents that nobody reads. Instead, we’ll define them as enforceable Python dictionaries. Here’s the policy template I use for my own agents:
from typing import Dict, List
class CompliancePolicy:
def __init__(self):
self.rules = {
"data_handling": {
"allowed_pii_fields": [],
"mask_pii": True,
"max_retention_days": 90
},
"action_boundaries": {
"allowed_tools": ["search", "read", "calculate"],
"blocked_tools": ["write", "delete", "execute"],
"max_tokens_per_action": 2000
},
"output_filters": {
"block_profanity": True,
"block_competitor_mentions": True,
"require_citation": True
},
"audit_requirements": {
"log_all_inputs": True,
"log_all_outputs": True,
"log_all_tool_calls": True
}
}
def validate(self, action: str, context: Dict) -> bool:
"""Check if an action violates any policy."""
# Implementation in next step
pass
Step 2: Implement a Real-Time Guardrail Layer
Now we wire up Guardrails AI to intercept every agent action. I’ve found that placing the guardrail before the LLM call, not after, catches more violations. Here’s the core enforcement function:
from guardrails import Guard
from guardrails.validators import LowerCase, TwoWords
import json
class ComplianceGuard:
def __init__(self, policy: CompliancePolicy):
self.policy = policy
self.guard = Guard()
# Register custom validator for tool access
self.guard.register_validator(
"allowed_tool",
lambda value, **kwargs: value in self.policy.rules["action_boundaries"]["allowed_tools"]
)
def check_input(self, user_input: str) -> bool:
"""Validate user input before agent processes it."""
# Check for blocked patterns
blocked_patterns = ["delete all", "ignore rules", "bypass"]
for pattern in blocked_patterns:
if pattern in user_input.lower():
print(f"BLOCKED: Input contained prohibited pattern '{pattern}'")
return False
# Check PII leakage
if self.policy.rules["data_handling"]["mask_pii"]:
# Simple PII check - in production use a proper PII scanner
import re
if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input): # SSN pattern
print("BLOCKED: Input contained potential PII")
return False
return True
def check_output(self, agent_output: str) -> bool:
"""Validate agent response before returning to user."""
# Block profanity (simplified example)
if self.policy.rules["output_filters"]["block_profanity"]:
profanity_list = ["badword1", "badword2"] # Use real list in prod
for word in profanity_list:
if word in agent_output.lower():
print(f"BLOCKED: Output contained blocked term '{word}'")
return False
# Require citations for factual claims
if self.policy.rules["output_filters"]["require_citation"]:
if "[" not in agent_output and "(" not in agent_output:
print("WARNING: Output missing citation markers")
# In strict mode, we'd block here
# return False
return True
Step 3: Build the Audit Trail
You can’t govern what you don’t log. I use Redis for the audit trail because it’s fast and supports TTL-based retention. Here’s my audit logger:
import redis
import json
from datetime import datetime, timedelta
import uuid
class AuditLogger:
def __init__(self, host='localhost', port=6379, retention_days=90):
self.client = redis.Redis(host=host, port=port, decode_responses=True)
self.retention_days = retention_days
def log_event(self, event_type: str, data: dict):
"""Log an event with automatic expiry."""
event_id = str(uuid.uuid4())
event_record = {
"timestamp": datetime.utcnow().isoformat(),
"event_type": event_type,
"data": data
}
# Store in Redis with TTL
key = f"audit:{event_type}:{event_id}"
self.client.setex(
key,
timedelta(days=self.retention_days),
json.dumps(event_record)
)
# Also add to a sorted set for time-range queries
self.client.zadd(
f"audit:timeline",
{event_id: datetime.utcnow().timestamp()}
)
return event_id
def query_events(self, event_type: str = None,
start_time: datetime = None,
end_time: datetime = None) -> list:
"""Retrieve audit events within a time range."""
if start_time and end_time:
start_ts = start_time.timestamp()
end_ts = end_time.timestamp()
event_ids = self.client.zrangebyscore(
"audit:timeline", start_ts, end_ts
)
else:
event_ids = self.client.zrange("audit:timeline", 0, -1)
results = []
for eid in event_ids:
# Fetch all event types for this ID
for key in self.client.scan_iter(f"audit:*:{eid}"):
record = json.loads(self.client.get(key))
if event_type is None or record["event_type"] == event_type:
results.append(record)
return results
Step 4: Wire Everything Into the Agent
Now we combine policy, guardrail, and audit into a single agent pipeline. This is the pattern I use in production:
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.tools import tool
from langchain_openai import ChatOpenAI
# Initialize compliance components
policy = CompliancePolicy()
guard = ComplianceGuard(policy)
audit = AuditLogger()
# Define tools with built-in compliance checks
@tool
def search_database(query: str) -> str:
"""Search internal database. Only reads data, never modifies."""
# Log the tool call
audit.log_event("tool_call", {
"tool": "search_database",
"query": query,
"timestamp": datetime.utcnow().isoformat()
})
# Simulated database search
return f"Results for '{query}': [simulated data]"
@tool
def calculate(expression: str) -> str:
"""Perform mathematical calculations."""
audit.log_event("tool_call", {
"tool": "calculate",
"expression": expression
})
try:
# Safe evaluation using ast.literal_eval in real code
result = eval(expression)
return str(result)
except:
return "Error: invalid expression"
# Build the agent with compliance wrapper
def compliant_agent_run(user_input: str) -> str:
"""Run the agent with full compliance checks."""
# Step 1: Check input
if not guard.check_input(user_input):
audit.log_event("blocked_input", {
"input": user_input,
"reason": "Failed input validation"
})
return "I cannot process this request due to compliance restrictions."
# Step 2: Log the input
audit.log_event("user_input", {"input": user_input})
# Step 3: Run the agent (simplified)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
tools = [search_database, calculate]
agent = create_openai_functions_agent(llm, tools)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
response = agent_executor.invoke({"input": user_input})
agent_output = response["output"]
# Step 4: Check output
if not guard.check_output(agent_output):
audit.log_event("blocked_output", {
"input": user_input,
"output": agent_output,
"reason": "Failed output validation"
})
return "I generated a response that violated compliance. This has been logged."
# Step 5: Log the output
audit.log_event("agent_output", {"output": agent_output})
return agent_output
# Example usage
if __name__ == "__main__":
# This should work
print(compliant_agent_run("What is 5 + 3?"))
# This should be blocked
print(compliant_agent_run("Delete all customer records"))
# This should trigger PII warning
print(compliant_agent_run("My SSN is 123-45-6789"))
Step 5: Test Your Compliance Setup
I always run a battery of tests before deploying. Here’s a quick test suite:
def test_compliance():
# Test 1: Blocked tool
result = compliant_agent_run("Write to the database")
assert "cannot process" in result, "Should block write operations"
# Test 2: PII detection
result = compliant_agent_run("My email is test@test.com")
# Should be blocked if email is in PII list
# Test 3: Audit log verification
logs = audit.query_events(event_type="blocked_input")
assert len(logs) > 0, "Should have logged blocked inputs"
# Test 4: Retention policy
# Check that old logs expire (simulated here)
print("All compliance tests passed!")
test_compliance()
Comparison of Compliance Approaches
After building this for three different enterprise teams, here’s what I’ve learned about the tradeoffs:
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| Pre-processing guardrails | Catches issues before LLM cost | Can’t catch all output violations | High-volume, low-risk agents |
| Post-processing validation | Catches all output violations | Wastes LLM compute on blocked outputs | High-stakes, low-volume agents |
| Hybrid (both sides) | Best coverage, least waste | More code to maintain | Production enterprise agents |
Final Thoughts on Deployment
In my experience, the compliance layer should be treated as a separate microservice, not embedded in the agent code. That way, you can update policies without redeploying agents. I run mine as a FastAPI service with its own database.
One more thing: start with strict rules and loosen them based on real data. It’s much easier to relax a policy than to tighten one after an incident. I learned that the hard way when an agent accidentally quoted competitor pricing in a customer email.
Go ahead
Related Articles
- AI Agents 101: Complete Beginner’s Guide to Agentic AI in 2026 — Main Guide
- How AI Agents Work Step by Step: A Practical 2026 Guide to Autonomous Systems
- AI Agent Safety in 2026: Essential Security Guardrails Every Business Must Know
- AI Agents Explained in Simple Terms: What They Are and Why 2026 Changes Everything
