Skip to main content
Version: main 🚧

Operator Guide: Enterprise RBAC (098)

Audience: Platform operators deploying CAIPE with Keycloak, Agent Gateway, and the CAIPE UI BFF.
Sources of truth: deploy/keycloak/realm-config.json, deploy/agentgateway/config.yaml, ui/src/lib/api-middleware.ts, ui/src/lib/rbac/, RAG server rbac.py.

1. Keycloak realm setup (caipe)​

1.1 Import and dev stack​

  • Realm export: deploy/keycloak/realm-config.json is bind-mounted into the Keycloak container by deploy/keycloak/docker-compose.yml as --import-realm data (see that compose file for ports; quickstart uses http://localhost:7080).
  • After import, verify realm caipe is enabled and clients exist (below).

1.2 Realm roles (global)​

Defined under roles.realm in realm-config.json:

RolePurpose (from export descriptions)
adminFull platform administration
chat_userInvoke supervisor, tools, MCP, A2A, skills (baseline chat user)
team_memberCreate/manage team-scoped RAG tools
kb_adminKB administration and ingest
offline_accessRefresh tokens (OIDC)
uma_authorizationUMA / Authorization Services participation

Note: denied in the permission matrix is a test persona (user with no chat roles), not a realm role in the export.

1.3 Per-resource and per-KB realm roles (conventions)​

The export includes examples of fine-grained KB roles; production deployments add more the same way:

PatternMeaning
kb_reader:<kb-id>Read/query KB <kb-id>
kb_ingestor:<kb-id>Ingest into <kb-id>
kb_admin:<kb-id>Admin for <kb-id>
kb_reader:*Read all KBs (wildcard)

Agent / task / skill roles follow the spec (FR-028): agent_user:<id>, agent_admin:<id>, and analogously task_user:<id>, task_admin:<id>, skill_user:<id>, skill_admin:<id> with wildcards :* where appropriate. These are not all pre-created in realm-config.json; assign them via Admin UI / Keycloak Admin API when provisioning resources.

1.4 Keycloak Authorization Services resources​

Client caipe-platform has authorizationServicesEnabled: true and defines resources (type caipe:component) with scopes:

ResourceScopes (subset)
admin_uiview, configure, admin, audit.view
slackview, invoke, admin
supervisorinvoke, configure, admin
ragquery, ingest, admin, tool.create, tool.update, tool.delete, tool.view, kb.admin, kb.ingest, kb.query
sub_agentinvoke, configure, admin
toolinvoke, configure, admin
skillview, invoke, configure, delete
a2acreate, view, configure, delete, admin
mcpinvoke, view, admin

Policies in the export map realm roles to these scopes (e.g. admin-role-policy, chat-user-role-policy, team-member-role-policy, kb-admin-role-policy plus composite scope policies such as rag-query-access, rag-team-tool-access, slack-access). Operators should extend policies when product matrix rows require roles beyond what the sample export grants (see permission-matrix.md Β§ Keycloak export alignment).

1.5 Clients​

Client IDPurposeNotes from export
caipe-uiNext.js / NextAuth OIDCConfidential, standard flow, authorizationServicesEnabled: false, redirect http://localhost:3000/* (adjust for prod)
caipe-platformResource server + PDP for UMAAuthorization Services enabled; used as audience for permission checks and Agent Gateway JWT audience
caipe-slack-botBot service account + OBOserviceAccountsEnabled: true, standardFlowEnabled: false, directAccessGrantsEnabled: false, attribute oidc.token.exchange.enabled: true

1.6 Client scopes and protocol mappers​

Default realm client scopes (defaultDefaultClientScopes): profile, email, roles, groups, org.

Important mappers:

  • roles scope β€” realm-roles β†’ JWT claim roles (multivalued string), also on userinfo/id token per mapper config.
  • groups scope β€” maps user attribute idp_groups β†’ claim groups (FR-010; populated by IdP / broker mappers).
  • org scope β€” user attribute org β†’ claim org (tenant hint, FR-020).
  • profile scope β€” includes caipe-audience mapper adding custom audience caipe-platform to tokens so resource-server and AG validation can accept them.

Identity provider mappers (Okta / Entra examples in export) illustrate importing groups into idp_groups and optional hardcoded role assignment from IdP group values.

1.7 Sample users​

realm-config.json includes seed users (e.g. admin@example.com, standard@example.com, kbadmin@example.com, denied@example.com, orgb@example.com) with differing realm roles for testingβ€”change passwords before any non-local use.


2. Agent Gateway deployment​

2.1 Layout​

  • Compose: deploy/agentgateway/docker-compose.yml
  • Config: deploy/agentgateway/config.yaml

2.2 JWT validation (strict mode)​

From config.yaml:

  • Listener jwtAuth: mode: strict
  • issuer: http://localhost:7080/realms/caipe (set to your realm issuer in each environment)
  • audiences: [caipe-platform]
  • jwks.url: realm JWKS (compose uses http://keycloak:7080/realms/caipe/protocol/openid-connect/certs for in-network Keycloak)

2.3 HTTP route CEL (tenant + subject)​

Authorization rules on the HTTP route:

  • Deny if no jwt.sub
  • Deny if jwt.org and header x_tenant_id both present and differ (tenant mismatch)
  • Allow if jwt.sub present

2.4 MCP authorization CEL (mcpAuthorization.rules)​

Rules are allow-if-any-match (documented inline in config). They gate tool names by prefix and realm roles in jwt.realm_access.roles, including:

  • Admin-only: admin_*, supervisor_config*
  • RAG: rag_query*, rag_ingest*, rag_tool*
  • Team tools: team_* (with admin / kb_admin / team_member branches)
  • Dynamic agent tools: names starting with dynamic_agent_ for chat/team/kb_admin/admin roles
  • General tools: chat-capable roles excluding admin/rag_ingest/supervisor_config prefixes

mcp.targets is empty in the sampleβ€”set real MCP backend URLs per environment.

2.5 Production checklist​

  • TLS termination and correct issuer / JWKS URLs for your Keycloak hostname
  • Rotate secrets; do not use dev client secrets from the repo export
  • Align CEL rules with permission-matrix.md and your IdP role names

3. CEL policy rules (where they live)​

3.1 Admin UI tab gates (admin_tab_policies)​

  • Storage: MongoDB collection admin_tab_policies
  • API: GET/PUT via BFF routes under ui/src/app/api/rbac/admin-tab-gates/ and policies listing admin-tab-policies
  • Behavior: CEL runs per tab; context includes user.email, user.roles (JWT realm roles plus session/bootstrap admin), user.teams, and feature flags are ANDed with CEL for several tabs (see docs/docs/api/rbac-roles.md)

3.2 BFF route CEL (CEL_RBAC_EXPRESSIONS)​

  • Env: CEL_RBAC_EXPRESSIONS β€” JSON map of resource#scope β†’ CEL expression string
  • Applied in: requireRbacPermission() in ui/src/lib/api-middleware.ts after Keycloak allows or role-fallback allows
  • Evaluator: ui/src/lib/rbac/cel-evaluator.ts β€” failures fail closed (deny)

3.3 Agent Gateway​

  • Inline CEL in deploy/agentgateway/config.yaml (see Β§2)

3.4 RAG server (optional CEL layer)​

  • Env: CEL_KB_ACCESS_EXPRESSION, CEL_KB_ACCESS_EXPRESSIONS (JSON map per KB/datasource)
  • Code: ai_platform_engineering/knowledge_bases/rag/server/src/server/rbac.py
  • If expressions are set but cel_evaluator is unavailable, KB filtering denies (fail-closed) or returns 503 when enforcement is requiredβ€”see code paths _filter_kb_ids_by_cel / _enforce_cel_kb_access

Per-KB access also uses Keycloak roles and MongoDB team ownership without requiring CEL to be configured (CEL is an additional configurable layer per FR-029).


4. ASP tool policy composition (FR-012)​

Enterprise RBAC (Keycloak / AG realm roles + matrix) and ASP / Global Tool Authorization are separate layers:

  1. RBAC evaluated first (BFF Keycloak UMA or AG CEL).
  2. If RBAC denies β†’ request denied.
  3. If RBAC allows β†’ ASP still applies where wired (e.g. supervisor tool filtering).
  4. If ASP denies β†’ deny wins (effective access = intersection).

Documented in permission-matrix.md Β§ Composition with ASP.


5. Fail-closed behavior​

5.1 Keycloak unavailable (BFF / UI path)​

  • checkPermission() in ui/src/lib/rbac/keycloak-authz.ts returns DENY_PDP_UNAVAILABLE on network/HTTP errors.
  • requireRbacPermission() then does not use role fallback for that outcome: it logs and throws 503 "Authorization service unavailable β€” access denied (fail-closed)".
  • When Keycloak returns a normal 403 denial, the user gets 403 with the standard denial payload.

Role fallback applies only when PDP returns a negative result that is not classified as PDP unavailable (see code: fallback for admin_ui/supervisor/rag minimum roles)β€”intended for gradual rollout, not for bypassing a down PDP.

5.2 Agent Gateway unavailable​

  • MCP/A2A/agent traffic cannot be validated or proxied β†’ requests fail (connection errors). Product expectation (FR-013): fail closedβ€”no silent bypass around AG for those paths.

5.3 MongoDB unavailable​

  • Admin tab CEL gates: depend on MongoDB for admin_tab_policies; failures should not grant tabs (implementation returns safe defaults / deniesβ€”verify in admin-tab-gates route when operating).
  • Team-scoped data (teams collection, ownership): getUserTeamIds and similar helpers catch errors and may return empty listsβ€”can narrow access or break features; do not assume elevated access.
  • RAG: if team ownership lookup cannot run where required, spec requires fail closed for query filtering (FR-027)β€”see RAG rbac.py implementation.

5.4 CEL evaluation errors (BFF)​

  • cel-evaluator.ts: parse/runtime errors β†’ false (deny).

6. Bootstrap admin (BOOTSTRAP_ADMIN_EMAILS)​

  • Purpose: Comma-separated list of emails treated as admin on login when IdP group β†’ role mapping is not yet configured.
  • Implementation: ui/src/lib/auth-config.ts (isBootstrapAdmin), also used from getAuthenticatedUser / requireRbacPermission role fallback for admin_ui when email matches.
  • Operational guidance: Remove or empty the variable after realm roles and group mappers are correct; it is a break-glass bootstrap, not a long-term RBAC model.

7. Environment variables (CAIPE UI / BFF)​

Copy from ui/.env.example and ui/env.example into .env.local. Below is a consolidated name + description list (no secret values).

OIDC / NextAuth​

VariableDescription
NEXTAUTH_SECRETNextAuth session encryption secret
NEXTAUTH_URLPublic base URL of the UI (callbacks)
NEXT_PUBLIC_SSO_ENABLEDEnable SSO UI paths (true/false)
OIDC_ISSUERKeycloak realm issuer URL
OIDC_CLIENT_IDOIDC client (typically caipe-ui)
OIDC_CLIENT_SECRETClient secret
OIDC_REQUIRED_GROUPOptional: require group membership to use app
OIDC_REQUIRED_ADMIN_GROUPOptional: map matching realm role name in token to admin session role
OIDC_GROUP_CLAIMOptional: claim name(s) for groups
OIDC_ENABLE_REFRESH_TOKENOptional: disable refresh if IdP lacks offline_access

Keycloak Admin API β€” UI BFF (FR-024)​

Used by the Next.js BFF (ui/src/lib/rbac/keycloak-admin.ts) for role-mapping CRUD, IdP config, etc. Reads in this order:

  1. client_credentials grant against the caipe realm using KEYCLOAK_ADMIN_CLIENT_ID + KEYCLOAK_ADMIN_CLIENT_SECRET (when both are non-empty).
  2. Otherwise falls back to the master realm password grant with hardcoded admin-cli/admin/admin (dev only).
VariableDescription
KEYCLOAK_URLKeycloak base URL
KEYCLOAK_REALMRealm name (caipe)
KEYCLOAK_ADMIN_CLIENT_IDUI BFF admin client (admin-cli dev or dedicated client prod)
KEYCLOAK_ADMIN_CLIENT_SECRETOptional; empty triggers password grant in dev (see .env.example)

Keycloak Admin API β€” Slack bot (FR-025 identity lookup)​

Used by ai_platform_engineering/integrations/slack_bot/utils/keycloak_admin.py to find a Keycloak user by slack_user_id user attribute and read/write team_id. Always uses client_credentials against the caipe realm β€” there is no password fallback.

The client referenced here MUST be confidential and have these realm-management roles: view-users, query-users (and manage-users if you also use the bot to set attributes).

VariableDescription
KEYCLOAK_SLACK_BOT_ADMIN_CLIENT_IDSlack bot's admin client. Default caipe-platform (the realm seeder grants the required roles). Do NOT set this to admin-cli β€” it's a public client and rejects client_credentials with HTTP 401.
KEYCLOAK_SLACK_BOT_ADMIN_CLIENT_SECRETMatching client_secret. In dev, defaults to caipe-platform-dev-secret.

Why a separate name from KEYCLOAK_ADMIN_*, and why include the surface name? Pre-098 the slack-bot read the same KEYCLOAK_ADMIN_* vars as the UI. A single KEYCLOAK_ADMIN_CLIENT_ID=admin-cli line in .env (intended for the UI's password-grant fallback) would silently override the slack-bot's client_credentials path, producing HTTP 401 "Public client not allowed to retrieve service account" on every Slack mention. The surface-specific KEYCLOAK_SLACK_BOT_ADMIN_* names eliminate that namespace collision permanently and leave room for future bot surfaces β€” e.g. KEYCLOAK_WEBEX_BOT_ADMIN_*, KEYCLOAK_TEAMS_BOT_ADMIN_* β€” without another rename.

Keycloak Authorization Services client (UMA checks)​

VariableDescription
KEYCLOAK_RESOURCE_SERVER_IDAudience / resource server client id (default caipe-platform)
KEYCLOAK_CLIENT_SECRETSecret for caipe-platform when required by your token exchange / setup

RBAC / CEL (BFF)​

VariableDescription
RBAC_CACHE_TTL_SECONDSTTL for permission decision cache (default 60; 0 disables)
CEL_RBAC_EXPRESSIONSJSON map resource#scope β†’ CEL string for supplementary checks

Bootstrap​

VariableDescription
BOOTSTRAP_ADMIN_EMAILSComma-separated emails with bootstrap admin

Data / URLs​

VariableDescription
MONGODB_URI / MONGODB_DATABASEMongoDB connection and DB name
NEXT_PUBLIC_MONGODB_ENABLEDClient hint for Mongo mode
NEXT_PUBLIC_CAIPE_URL / NEXT_PUBLIC_A2A_BASE_URLSupervisor / A2A base URL
NEXT_PUBLIC_RAG_URLRAG server URL

Feature flags (admin tabs, audit, tickets, …)​

See ui/src/lib/config.ts for full list: e.g. FEEDBACK_ENABLED, NPS_ENABLED, AUDIT_LOGS_ENABLED, ACTION_AUDIT_ENABLED, REPORT_PROBLEM_ENABLED, ticket integration vars, workflow runner, etc.

Slack linking (BFF)​

VariableDescription
SLACK_BOT_TOKENUsed by BFF to post Slack DM after identity link (FR-025)

Docker Compose may set additional names (KEYCLOAK_BOT_CLIENT_* for bot, etc.)β€”see docker-compose.dev.yaml for the slack-bot and caipe-ui services.