Skip to main content
Version: main 🚧

Slack Input/Output Guardrails

Every user prompt entering CAIPE through Slack and every LLM response leaving CAIPE back to Slack passes through a guardrail layer. Input guardrails protect the LLM and downstream systems from malicious, sensitive, or out-of-policy content. Output guardrails prevent the LLM from leaking secrets, PII, hallucinated actions, or harmful content back into Slack channels.


Guardrail Placement in the Pipeline

The guardrails sit inside the Slack Bot Backend Server, wrapping the A2A call to the CAIPE Orchestrator. They are the last checkpoint before a prompt reaches the LLM and the first checkpoint before a response reaches Slack.

Insertion Points in Code

Both guardrails are centralized in utils/ai.py inside stream_a2a_response(), so every code path (mentions, DMs, Q&A, AI alerts, retries) passes through them:

GuardrailLocationRuns BeforeRuns After
InputStart of stream_a2a_response()a2a_client.send_message_stream()Prompt assembly (extract_message_text + build_thread_context)
OutputAfter _get_final_text()_stream_final_response() / _post_final_response()_check_overthink_skip() and confidence marker stripping

Input Guardrails

Input guardrails validate and sanitize every user prompt before it is sent to the CAIPE Orchestrator via A2A. A blocked input never reaches the LLM.

Input Guardrail Details

GuardrailAction on DetectResponse to UserLogged
Length & ComplexityBlock"Your message exceeds the maximum length. Please shorten it."Message length, user_id
Secrets DetectionBlock"Your message appears to contain a secret or credential. Please remove it and try again."Detection type (no secret value)
PII DetectionRedact + WarnPII replaced with [REDACTED], user warned: "I detected and removed personal information from your message."Detection type, field count
Prompt InjectionBlock"I wasn't able to process that request." (generic, no details)Full classification, user_id, channel_id
Content PolicyBlock"That request falls outside what I can help with."Policy category, user_id

Prompt Injection Patterns Detected

The injection detector identifies attempts to manipulate the LLM's system prompt or behavior:


Output Guardrails

Output guardrails validate every LLM response before it is posted to Slack. They protect against data leakage, hallucinated actions, and policy violations in the model's output.

Output Guardrail Details

GuardrailAction on DetectWhat User SeesLogged
Secrets & Credential ScanRedact in-placeSecrets replaced with [CREDENTIAL REDACTED]Detection type (no secret value)
PII Leak DetectionRedact in-placePII replaced with [REDACTED]Field type, count
Hallucination MarkersFlag with disclaimerResponse posted with: "⚠️ Some information in this response could not be verified."Flagged segments
Content SafetyReplace entire response"I'm unable to provide a response for this request. Please rephrase or ask something else."Policy category
Format & SanitizationSanitize in-placeClean output (safe mrkdwn, validated links)Sanitization count

Full Sequence with Guardrails

This sequence diagram shows the complete flow from Slack message to Slack response, with both guardrail layers highlighted.


Guardrail Architecture Patterns

Pattern 1: Middleware in stream_a2a_response()

The recommended pattern centralizes both guardrails in the single function that all Slack handlers call. Every code path — mentions, DMs, Q&A, AI alerts, retries — passes through the same guardrails.

Pattern 2: Pluggable Guardrail Chain

Each guardrail is a pluggable module that can be independently enabled, configured, or replaced. The chain is defined in configuration and executed sequentially.

Configuration

guardrails:
input:
enabled: true
chain:
- name: length
max_tokens: 4096
max_thread_depth: 20
- name: secrets
patterns: ["aws_key", "github_token", "stripe_key", "jwt", "private_key", "connection_string"]
action: block
- name: pii
entities: ["ssn", "credit_card", "phone", "address"]
action: redact
- name: injection
detection: classifier
threshold: 0.85
action: block
- name: policy
scope: platform-engineering
action: block
output:
enabled: true
chain:
- name: secrets
action: redact
- name: pii
action: redact
- name: hallucination
action: flag
- name: content_safety
action: replace
- name: format
action: sanitize
logging:
log_blocked: true
log_redacted: true
alert_threshold: 10 # alert after N blocks per user per hour

Observability & Audit

Every guardrail decision is logged for audit, incident response, and guardrail tuning.

Metrics

MetricTypeLabels
guardrail_input_totalCounterresult (pass/block/redact), guardrail, channel_id
guardrail_output_totalCounterresult (pass/redact/replace/flag), guardrail
guardrail_latency_secondsHistogramstage (input/output), guardrail
guardrail_blocked_per_userCounteruser_id, guardrail