Operator Guide: Enterprise RBAC (098)
Audience: Platform operators deploying CAIPE with Keycloak, Agent Gateway, and the CAIPE UI BFF.
Sources of truth: deploy/keycloak/realm-config.json, deploy/agentgateway/config.yaml, ui/src/lib/api-middleware.ts, ui/src/lib/rbac/, RAG server rbac.py.
1. Keycloak realm setup (caipe)β
1.1 Import and dev stackβ
- Realm export:
deploy/keycloak/realm-config.jsonis bind-mounted into the Keycloak container bydeploy/keycloak/docker-compose.ymlas--import-realmdata (see that compose file for ports; quickstart useshttp://localhost:7080). - After import, verify realm
caipeis enabled and clients exist (below).
1.2 Realm roles (global)β
Defined under roles.realm in realm-config.json:
| Role | Purpose (from export descriptions) |
|---|---|
admin | Full platform administration |
chat_user | Invoke supervisor, tools, MCP, A2A, skills (baseline chat user) |
team_member | Create/manage team-scoped RAG tools |
kb_admin | KB administration and ingest |
offline_access | Refresh tokens (OIDC) |
uma_authorization | UMA / Authorization Services participation |
Note: denied in the permission matrix is a test persona (user with no chat roles), not a realm role in the export.
1.3 Per-resource and per-KB realm roles (conventions)β
The export includes examples of fine-grained KB roles; production deployments add more the same way:
| Pattern | Meaning |
|---|---|
kb_reader:<kb-id> | Read/query KB <kb-id> |
kb_ingestor:<kb-id> | Ingest into <kb-id> |
kb_admin:<kb-id> | Admin for <kb-id> |
kb_reader:* | Read all KBs (wildcard) |
Agent / task / skill roles follow the spec (FR-028): agent_user:<id>, agent_admin:<id>, and analogously task_user:<id>, task_admin:<id>, skill_user:<id>, skill_admin:<id> with wildcards :* where appropriate. These are not all pre-created in realm-config.json; assign them via Admin UI / Keycloak Admin API when provisioning resources.
1.4 Keycloak Authorization Services resourcesβ
Client caipe-platform has authorizationServicesEnabled: true and defines resources (type caipe:component) with scopes:
| Resource | Scopes (subset) |
|---|---|
admin_ui | view, configure, admin, audit.view |
slack | view, invoke, admin |
supervisor | invoke, configure, admin |
rag | query, ingest, admin, tool.create, tool.update, tool.delete, tool.view, kb.admin, kb.ingest, kb.query |
sub_agent | invoke, configure, admin |
tool | invoke, configure, admin |
skill | view, invoke, configure, delete |
a2a | create, view, configure, delete, admin |
mcp | invoke, view, admin |
Policies in the export map realm roles to these scopes (e.g. admin-role-policy, chat-user-role-policy, team-member-role-policy, kb-admin-role-policy plus composite scope policies such as rag-query-access, rag-team-tool-access, slack-access). Operators should extend policies when product matrix rows require roles beyond what the sample export grants (see permission-matrix.md Β§ Keycloak export alignment).
1.5 Clientsβ
| Client ID | Purpose | Notes from export |
|---|---|---|
caipe-ui | Next.js / NextAuth OIDC | Confidential, standard flow, authorizationServicesEnabled: false, redirect http://localhost:3000/* (adjust for prod) |
caipe-platform | Resource server + PDP for UMA | Authorization Services enabled; used as audience for permission checks and Agent Gateway JWT audience |
caipe-slack-bot | Bot service account + OBO | serviceAccountsEnabled: true, standardFlowEnabled: false, directAccessGrantsEnabled: false, attribute oidc.token.exchange.enabled: true |
1.6 Client scopes and protocol mappersβ
Default realm client scopes (defaultDefaultClientScopes): profile, email, roles, groups, org.
Important mappers:
rolesscope βrealm-rolesβ JWT claimroles(multivalued string), also on userinfo/id token per mapper config.groupsscope β maps user attributeidp_groupsβ claimgroups(FR-010; populated by IdP / broker mappers).orgscope β user attributeorgβ claimorg(tenant hint, FR-020).profilescope β includescaipe-audiencemapper adding custom audiencecaipe-platformto tokens so resource-server and AG validation can accept them.
Identity provider mappers (Okta / Entra examples in export) illustrate importing groups into idp_groups and optional hardcoded role assignment from IdP group values.
1.7 Sample usersβ
realm-config.json includes seed users (e.g. admin@example.com, standard@example.com, kbadmin@example.com, denied@example.com, orgb@example.com) with differing realm roles for testingβchange passwords before any non-local use.
2. Agent Gateway deploymentβ
2.1 Layoutβ
- Compose:
deploy/agentgateway/docker-compose.yml - Config:
deploy/agentgateway/config.yaml
2.2 JWT validation (strict mode)β
From config.yaml:
- Listener
jwtAuth:mode: strict issuer:http://localhost:7080/realms/caipe(set to your realm issuer in each environment)audiences:[caipe-platform]jwks.url: realm JWKS (compose useshttp://keycloak:7080/realms/caipe/protocol/openid-connect/certsfor in-network Keycloak)
2.3 HTTP route CEL (tenant + subject)β
Authorization rules on the HTTP route:
- Deny if no
jwt.sub - Deny if
jwt.organd headerx_tenant_idboth present and differ (tenant mismatch) - Allow if
jwt.subpresent
2.4 MCP authorization CEL (mcpAuthorization.rules)β
Rules are allow-if-any-match (documented inline in config). They gate tool names by prefix and realm roles in jwt.realm_access.roles, including:
- Admin-only:
admin_*,supervisor_config* - RAG:
rag_query*,rag_ingest*,rag_tool* - Team tools:
team_*(withadmin/kb_admin/team_memberbranches) - Dynamic agent tools: names starting with
dynamic_agent_for chat/team/kb_admin/admin roles - General tools: chat-capable roles excluding admin/rag_ingest/supervisor_config prefixes
mcp.targets is empty in the sampleβset real MCP backend URLs per environment.
2.5 Production checklistβ
- TLS termination and correct issuer / JWKS URLs for your Keycloak hostname
- Rotate secrets; do not use dev client secrets from the repo export
- Align CEL rules with permission-matrix.md and your IdP role names
3. CEL policy rules (where they live)β
3.1 Admin UI tab gates (admin_tab_policies)β
- Storage: MongoDB collection
admin_tab_policies - API:
GET/PUTvia BFF routes underui/src/app/api/rbac/admin-tab-gates/and policies listingadmin-tab-policies - Behavior: CEL runs per tab; context includes
user.email,user.roles(JWT realm roles plus session/bootstrap admin),user.teams, and feature flags are ANDed with CEL for several tabs (seedocs/docs/api/rbac-roles.md)
3.2 BFF route CEL (CEL_RBAC_EXPRESSIONS)β
- Env:
CEL_RBAC_EXPRESSIONSβ JSON map ofresource#scopeβ CEL expression string - Applied in:
requireRbacPermission()inui/src/lib/api-middleware.tsafter Keycloak allows or role-fallback allows - Evaluator:
ui/src/lib/rbac/cel-evaluator.tsβ failures fail closed (deny)
3.3 Agent Gatewayβ
- Inline CEL in
deploy/agentgateway/config.yaml(see Β§2)
3.4 RAG server (optional CEL layer)β
- Env:
CEL_KB_ACCESS_EXPRESSION,CEL_KB_ACCESS_EXPRESSIONS(JSON map per KB/datasource) - Code:
ai_platform_engineering/knowledge_bases/rag/server/src/server/rbac.py - If expressions are set but
cel_evaluatoris unavailable, KB filtering denies (fail-closed) or returns 503 when enforcement is requiredβsee code paths_filter_kb_ids_by_cel/_enforce_cel_kb_access
Per-KB access also uses Keycloak roles and MongoDB team ownership without requiring CEL to be configured (CEL is an additional configurable layer per FR-029).
4. ASP tool policy composition (FR-012)β
Enterprise RBAC (Keycloak / AG realm roles + matrix) and ASP / Global Tool Authorization are separate layers:
- RBAC evaluated first (BFF Keycloak UMA or AG CEL).
- If RBAC denies β request denied.
- If RBAC allows β ASP still applies where wired (e.g. supervisor tool filtering).
- If ASP denies β deny wins (effective access = intersection).
Documented in permission-matrix.md Β§ Composition with ASP.
5. Fail-closed behaviorβ
5.1 Keycloak unavailable (BFF / UI path)β
checkPermission()inui/src/lib/rbac/keycloak-authz.tsreturnsDENY_PDP_UNAVAILABLEon network/HTTP errors.requireRbacPermission()then does not use role fallback for that outcome: it logs and throws 503 "Authorization service unavailable β access denied (fail-closed)".- When Keycloak returns a normal 403 denial, the user gets 403 with the standard denial payload.
Role fallback applies only when PDP returns a negative result that is not classified as PDP unavailable (see code: fallback for admin_ui/supervisor/rag minimum roles)βintended for gradual rollout, not for bypassing a down PDP.
5.2 Agent Gateway unavailableβ
- MCP/A2A/agent traffic cannot be validated or proxied β requests fail (connection errors). Product expectation (FR-013): fail closedβno silent bypass around AG for those paths.
5.3 MongoDB unavailableβ
- Admin tab CEL gates: depend on MongoDB for
admin_tab_policies; failures should not grant tabs (implementation returns safe defaults / deniesβverify inadmin-tab-gatesroute when operating). - Team-scoped data (teams collection, ownership):
getUserTeamIdsand similar helpers catch errors and may return empty listsβcan narrow access or break features; do not assume elevated access. - RAG: if team ownership lookup cannot run where required, spec requires fail closed for query filtering (FR-027)βsee RAG
rbac.pyimplementation.
5.4 CEL evaluation errors (BFF)β
cel-evaluator.ts: parse/runtime errors β false (deny).
6. Bootstrap admin (BOOTSTRAP_ADMIN_EMAILS)β
- Purpose: Comma-separated list of emails treated as admin on login when IdP group β role mapping is not yet configured.
- Implementation:
ui/src/lib/auth-config.ts(isBootstrapAdmin), also used fromgetAuthenticatedUser/requireRbacPermissionrole fallback foradmin_uiwhen email matches. - Operational guidance: Remove or empty the variable after realm roles and group mappers are correct; it is a break-glass bootstrap, not a long-term RBAC model.
7. Environment variables (CAIPE UI / BFF)β
Copy from ui/.env.example and ui/env.example into .env.local. Below is a consolidated name + description list (no secret values).
OIDC / NextAuthβ
| Variable | Description |
|---|---|
NEXTAUTH_SECRET | NextAuth session encryption secret |
NEXTAUTH_URL | Public base URL of the UI (callbacks) |
NEXT_PUBLIC_SSO_ENABLED | Enable SSO UI paths (true/false) |
OIDC_ISSUER | Keycloak realm issuer URL |
OIDC_CLIENT_ID | OIDC client (typically caipe-ui) |
OIDC_CLIENT_SECRET | Client secret |
OIDC_REQUIRED_GROUP | Optional: require group membership to use app |
OIDC_REQUIRED_ADMIN_GROUP | Optional: map matching realm role name in token to admin session role |
OIDC_GROUP_CLAIM | Optional: claim name(s) for groups |
OIDC_ENABLE_REFRESH_TOKEN | Optional: disable refresh if IdP lacks offline_access |
Keycloak Admin API β UI BFF (FR-024)β
Used by the Next.js BFF (ui/src/lib/rbac/keycloak-admin.ts) for role-mapping CRUD, IdP config, etc. Reads in this order:
client_credentialsgrant against thecaiperealm usingKEYCLOAK_ADMIN_CLIENT_ID+KEYCLOAK_ADMIN_CLIENT_SECRET(when both are non-empty).- Otherwise falls back to the
masterrealmpasswordgrant with hardcodedadmin-cli/admin/admin(dev only).
| Variable | Description |
|---|---|
KEYCLOAK_URL | Keycloak base URL |
KEYCLOAK_REALM | Realm name (caipe) |
KEYCLOAK_ADMIN_CLIENT_ID | UI BFF admin client (admin-cli dev or dedicated client prod) |
KEYCLOAK_ADMIN_CLIENT_SECRET | Optional; empty triggers password grant in dev (see .env.example) |
Keycloak Admin API β Slack bot (FR-025 identity lookup)β
Used by ai_platform_engineering/integrations/slack_bot/utils/keycloak_admin.py to find a Keycloak user by slack_user_id user attribute and read/write team_id. Always uses client_credentials against the caipe realm β there is no password fallback.
The client referenced here MUST be confidential and have these realm-management roles: view-users, query-users (and manage-users if you also use the bot to set attributes).
| Variable | Description |
|---|---|
KEYCLOAK_SLACK_BOT_ADMIN_CLIENT_ID | Slack bot's admin client. Default caipe-platform (the realm seeder grants the required roles). Do NOT set this to admin-cli β it's a public client and rejects client_credentials with HTTP 401. |
KEYCLOAK_SLACK_BOT_ADMIN_CLIENT_SECRET | Matching client_secret. In dev, defaults to caipe-platform-dev-secret. |
Why a separate name from
KEYCLOAK_ADMIN_*, and why include the surface name? Pre-098 the slack-bot read the sameKEYCLOAK_ADMIN_*vars as the UI. A singleKEYCLOAK_ADMIN_CLIENT_ID=admin-cliline in.env(intended for the UI's password-grant fallback) would silently override the slack-bot's client_credentials path, producingHTTP 401 "Public client not allowed to retrieve service account"on every Slack mention. The surface-specificKEYCLOAK_SLACK_BOT_ADMIN_*names eliminate that namespace collision permanently and leave room for future bot surfaces β e.g.KEYCLOAK_WEBEX_BOT_ADMIN_*,KEYCLOAK_TEAMS_BOT_ADMIN_*β without another rename.
Keycloak Authorization Services client (UMA checks)β
| Variable | Description |
|---|---|
KEYCLOAK_RESOURCE_SERVER_ID | Audience / resource server client id (default caipe-platform) |
KEYCLOAK_CLIENT_SECRET | Secret for caipe-platform when required by your token exchange / setup |
RBAC / CEL (BFF)β
| Variable | Description |
|---|---|
RBAC_CACHE_TTL_SECONDS | TTL for permission decision cache (default 60; 0 disables) |
CEL_RBAC_EXPRESSIONS | JSON map resource#scope β CEL string for supplementary checks |
Bootstrapβ
| Variable | Description |
|---|---|
BOOTSTRAP_ADMIN_EMAILS | Comma-separated emails with bootstrap admin |
Data / URLsβ
| Variable | Description |
|---|---|
MONGODB_URI / MONGODB_DATABASE | MongoDB connection and DB name |
NEXT_PUBLIC_MONGODB_ENABLED | Client hint for Mongo mode |
NEXT_PUBLIC_CAIPE_URL / NEXT_PUBLIC_A2A_BASE_URL | Supervisor / A2A base URL |
NEXT_PUBLIC_RAG_URL | RAG server URL |
Feature flags (admin tabs, audit, tickets, β¦)β
See ui/src/lib/config.ts for full list: e.g. FEEDBACK_ENABLED, NPS_ENABLED, AUDIT_LOGS_ENABLED, ACTION_AUDIT_ENABLED, REPORT_PROBLEM_ENABLED, ticket integration vars, workflow runner, etc.
Slack linking (BFF)β
| Variable | Description |
|---|---|
SLACK_BOT_TOKEN | Used by BFF to post Slack DM after identity link (FR-025) |
Docker Compose may set additional names (KEYCLOAK_BOT_CLIENT_* for bot, etc.)βsee docker-compose.dev.yaml for the slack-bot and caipe-ui services.
Related documentsβ
- permission-matrix.md β FR-008 / FR-014 capability matrix
- security-review.md β verification checklist
- quickstart.md β local bring-up
- spec.md β normative requirements