CAIPE RBAC
Audience: Junior engineers getting oriented + security architects reviewing the design.
This is the canonical reference for how authentication and authorization work in CAIPE. Start with the feature guide if you need one linear explanation, then use the focused docs for deeper reference:
| If you want to⦠| Read |
|---|---|
| Explain the feature front to back to CAIPE users, admins, operators, or security reviewers | Enterprise RBAC and ReBAC Feature Guide |
| Understand each component (Keycloak, UI, Supervisor, AgentGateway, Dynamic Agents) and how they're wired | Architecture |
| Get the short end-to-end summary of the Comprehensive RBAC refactor, including Keycloak roles, AgentGateway, and OpenFGA | Comprehensive RBAC Refactor |
| Understand how JWT identity and OpenFGA relationship checks work together | JWT and OpenFGA |
| Understand exactly where OpenFGA union/computed permissions are evaluated, what gets stored vs. computed, and follow a worked end-to-end Probe-button example | OpenFGA Permission Evaluation |
| Trace a request β login, OBO token-exchange, end-to-end Slack/Webex flow, Slack channel or Webex space β agent routing | Workflows |
| Log in, exercise a role, verify a denial, link a Slack/Webex user, run the demo | Usage |
| Find the file that owns a specific piece of the auth path | File map |
| Understand the difference between Keycloak roles and client scopes, what a slug is, and what happens when you create a team | Roles vs Scopes |
| Install or upgrade the RBAC/OpenFGA refactor with Helm, including optional Keycloak, AgentGateway, OpenFGA, and bridge runtime components | Helm installation and upgrade guide |
| Install CAIPE on a real K8s cluster β bootstrap admin, IdP, and slack-bot client secrets via dev defaults, manual K8s Secrets, or ESO (Vault / AWS-SM / GCP-SM) | Secrets bootstrap |
For the live caipe/rbac GitOps upgrade, see
CAIPE RBAC Helm Migration Guide.
Every component-level doc opens with a badge analogy to build intuition, followed by the precise technical detail. Read the analogy first, then the technical section β they describe the same thing at different levels of abstraction.
The Big Pictureβ
Think of CAIPE like a secure corporate office building:
- Keycloak is HR + the front desk. It issues ID badges, manages who works here, and verifies contractors through a partner agency (your enterprise IdP β typically Okta or Duo SSO).
- Every service is a room with its own badge reader. You prove who you are once at the front desk, get a badge, and that badge is checked at every door β no calling HR again each time.
- AgentGateway is the armed security checkpoint between the office and the server room. Everyone must show their badge, and the checkpoint calls OpenFGA through
ext_authzfor the PDP decision before proxying. - Team Resources and OpenFGA ReBAC in the Admin UI are the rich ReBAC authoring surfaces: admins assign agents and MCP tool prefixes to a team, preview effective OpenFGA access, inspect all relationships in a full-screen graph, edit relationships on a drag/drop graph canvas, and inspect materialized tuples.
- CEL policy editing is retired for the management plane. Admins use OpenFGA/ReBAC relationships instead of editing AgentGateway or admin-tab CEL rules.
- Identity Group Sync maps enterprise groups into CAIPE team memberships using a hybrid source model:
memberOf/groupsclaims refresh the signed-in user's memberships at login, while direct Okta directory queries power full admin dry-runs, removals, and drift detection. - Slack channels and Webex spaces are external messaging rooms. They are not identity providers; each bot maps the messaging surface to a Keycloak user, exchanges an OBO token, and checks OpenFGA before dispatching.
- The badge itself is a JWT β a tamper-proof, digitally signed card that any badge reader can verify independently without phoning HR.
Technically: CAIPE uses OpenID Connect (OIDC) for authentication and JWT bearer tokens for stateless authorization across all service boundaries. There is one token issuer (Keycloak), and every service verifies tokens against Keycloak's published JWKS public keys β no shared secrets, no per-hop re-authentication.
In 0.5.0, the umbrella Helm chart can deploy the RBAC runtime components (tags.keycloak, openfga.enabled, openfgaAuthzBridge.enabled, and agentgateway.enabled) so release consumers do not need separate companion app manifests for the default stack.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CAIPE Trust Boundary β
β β
β ββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β Keycloak β β CAIPE UI β β Supervisor β β Dynamic β β
β β (OIDC IdP)β β (Next.js) β β A2A Server β β Agents β β
β β port 7080 β β port 3000 β β port 8000 β β port 8001 β β
β ββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β Token issuer NextAuth + RBAC JwtUserContext get_current_user β
β JWKS endpoint middleware middleware FastAPI Depends β
β User profile Session β API contextvar JWKS validation β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AgentGateway (Policy Enforcement Point) β β
β β port 4000 Β· ext_authzβOpenFGA Β· JWT passthrough β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β βΌ β β
β ββββββββββββββββ β β
β β OpenFGA β β β
β β Remote PDP β β β
β ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββΌβββββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β RAG MCP β β ArgoCD MCPβ βGitHub MCP β ... β
β β Server β β Server β β Server β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β JWKS validation at each MCP β tokens verified independently β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Security properties the architecture is designed to guarantee:
| Property | How it's achieved |
|---|---|
| Single source of truth for identity | Keycloak is the only token issuer; all services verify against its JWKS |
| No credentials in transit between services | JWT is a signed assertion β no password or secret is passed between hops |
| User identity preserved end-to-end | The same JWT travels Slack/Webex Bot β Supervisor β AgentGateway β MCP unchanged |
| Delegation is auditable | OBO tokens carry act.sub (the delegating party) alongside sub (the real user) |
| Policy enforcement is centralised | AgentGateway is the single PEP for all MCP tool calls; tools don't implement their own authz |
| Remote PDP for relationships | AgentGateway extAuthz calls OpenFGA before proxying MCP traffic |
| Admin-configured ReBAC | Team Resources saves write OpenFGA team/agent/tool tuples from the same source of truth as Keycloak roles; OpenFGA ReBAC provides guided tuple creation, checks, full-screen all-relationship graph viewing, drag/drop graph editing, and tuple inspection |
| Group-to-team provenance | Identity Group Sync records whether a membership came from login claims, Okta sync, manual admin action, bootstrap, or policy rules |
| Least privilege at tool layer | OpenFGA ReBAC is the authoritative AgentGateway policy path; service-side checks provide defense in depth |
| Tenant isolation | tenant claim in JWT scopes data visible to the MCP server |
Core Concept: The JWTβ
When you log in, Keycloak issues a JWT (JSON Web Token) signed with RS256 using its realm private key. It's a base64url-encoded envelope of three parts: header.payload.signature.
A decoded payload looks like this:
{
"iss": "http://localhost:7080/realms/caipe",
"sub": "a3f9b2c1-...",
"email": "alice@example.com",
"name": "Alice Smith",
"realm_access": {
"roles": ["admin", "chat_user"]
},
"resource_access": {
"caipe-ui": { "roles": ["uma_protection"] }
},
"tenant": "acme",
"exp": 1713200000,
"iat": 1713196400,
"act": {
"sub": "slack-bot-client"
}
}
Key fields for security architects:
| Claim | Purpose | Where it's enforced |
|---|---|---|
iss | Token issuer β services reject tokens from unknown issuers | Dynamic agents JWKS validation, RAG server |
sub | Opaque user ID (Keycloak UUID) β stable, not guessable | Conversation ownership, audit logs |
email | Human-readable identity β used for display and Slack linking | UI, supervisor user context |
realm_access.roles | Realm-level role assignments | Dynamic agents is_admin, Web UI backend fallback checks, service-side defense in depth |
exp | Token expiry β enforced cryptographically | All JWKS validators, NextAuth refresh |
act.sub | Delegation chain β set on OBO tokens only | Audit: proves bot acted on behalf of user |
tenant | Multi-tenant data scoping | RAG server query isolation |
Services never call Keycloak on each request. They validate the signature offline using the cached JWKS public key. JWKS is refreshed on cache miss (unknown kid) or on a TTL (1 hour).
Threat Model Considerationsβ
| Threat | Mitigation |
|---|---|
| JWT forgery | RS256 signature verified against Keycloak JWKS; private key never leaves Keycloak |
| JWT replay after expiry | exp claim enforced at every JWKS validation point |
| Token theft from browser | NextAuth stores tokens in httpOnly server-side session cookie; raw JWT never in JS context |
| Bot impersonating arbitrary user via OBO | Keycloak's token-exchange permission must be explicitly granted to the bot client; not available by default |
| Privilege escalation via claim manipulation | JWT is signed; any claim modification invalidates the RS256 signature |
| Tenant data leakage | tenant claim in JWT used for query scoping at MCP layer and service-side filters |
| PDP outage fail-open | AgentGateway extAuthz.failureMode.denyWithStatus=403 fails closed if OpenFGA/bridge is unavailable |
| AgentGateway admin exposure | Only the data-plane listener (4000) should be ingress-exposed; the admin listener (15000) remains private inside the cluster |
| Unlinked Slack/Webex users bypassing RBAC | Bot runtime gates block unlinked users before the supervisor is called |
AUTH_ENABLED=false in production | Startup log emits a WARNING when auth is disabled; also documented in Architecture βΊ Dynamic Agents env vars |
| Bootstrap admin left permanently enabled | No automatic enforcement β documented operational risk; must be removed post-setup |
Where to nextβ
- Architecture β Component-by-component reference: Keycloak, UI, Supervisor, AgentGateway, Dynamic Agents.
- Workflows β Sequence diagrams for login, OBO, end-to-end requests, Slack channel and Webex space routing.
- Usage β Bring up the stack, log in as test users, verify RBAC denials, run the demo.
- File map β When you need to change something, this tells you which file to open.