Research: Enterprise RBAC for Slack and CAIPE UI
Phase 0 Output | Date: 2026-03-25 | Plan: plan.md
R-01: Keycloak Authorization Services vs Code-Based RBAC
Decision: Keycloak Authorization Services as PDP for UI/Slack paths (FR-022).
Rationale: Keycloak AuthZ provides resources, scopes, and policies natively — no custom PDP needed. Already partially implemented in ui/src/lib/rbac/keycloak-authz.ts. Sub-5ms decision latency achievable with local policy cache. Eliminates the previously considered caipe-authorization-server fallback.
Alternatives considered:
- Custom PDP service (
caipe-authorization-server) — rejected; adds deployment complexity, Keycloak is already required - OPA/Rego sidecar — rejected; CEL mandated (FR-029), adding OPA creates dual-engine maintenance
- Pure code-based checks — rejected; doesn't meet configurable policy requirement (FR-029)
R-02: CEL Evaluator Library Selection
Decision: cel-python (Python), cel-js (TypeScript) — already in use across the codebase.
Rationale: Both libraries are already imported and operational in 4 services:
- UI:
ui/src/lib/rbac/cel-evaluator.ts(cel-js) - Python shared:
ai_platform_engineering/utils/cel_evaluator.py(celpy) - RAG server:
ai_platform_engineering/knowledge_bases/rag/server/src/cel_evaluator.py - Dynamic agents:
ai_platform_engineering/dynamic_agents/src/dynamic_agents/cel_evaluator.py
Shared CEL context schema (FR-029): user.roles, user.teams, user.email, user.org, resource.id, resource.type, resource.visibility, resource.owner_id, resource.shared_with_teams.
Alternatives considered:
- Google CEL-Go with WASM — rejected; adds compilation step, not needed when native libraries work
- Custom expression parser — rejected; violates FR-029 (CEL mandated)
R-03: OBO Token Exchange with Keycloak
Decision: OAuth 2.0 Token Exchange (RFC 8693) via Keycloak's built-in token exchange endpoint.
Rationale: Keycloak supports token exchange natively (/realms/{realm}/protocol/openid-connect/token with grant_type=urn:ietf:params:oauth:grant-type:token-exchange). The resulting token carries sub (user), act (bot/agent), scope, roles, and org claims. Requires enabling the token-exchange feature on the Keycloak realm and granting the bot service account the token-exchange client role.
Key configuration:
- Enable
token-exchangefeature flag on Keycloak realm - Create
caipe-botclient (confidential, service account enabled) - Grant
token-exchangepermission tocaipe-botforcaipe-uiclient - Bot exchanges Slack user identity → Keycloak OBO token → forwarded to AG/supervisor
Alternatives considered:
- Custom JWT minting in bot backend — rejected; violates RFC 8693, not verifiable by AG
- Passing Slack user context as metadata without token — rejected; AG requires JWT validation
R-04: Agent Gateway Deployment with Keycloak
Decision: Deploy AG as a standalone sidecar or Kubernetes service, configured with Keycloak as OIDC provider.
Rationale: AG already has a Keycloak tutorial for MCP auth. AG validates JWT iss against Keycloak JWKS endpoint, applies CEL policy rules. For local dev, AG runs as a Docker container alongside Keycloak.
Key configuration:
- AG OIDC provider: Keycloak realm JWKS endpoint
- CEL policy rules mirror the 098 permission matrix rows for MCP tool invocation
- Fail-closed: if AG is down, MCP/A2A/agent requests are denied
Alternatives considered:
- Envoy + ext-authz — rejected; AG is purpose-built for MCP/A2A, Envoy is generic
- No gateway (direct MCP) — rejected; FR-013 mandates AG for MCP/A2A/agent traffic
R-05: Slack Identity Linking Flow
Decision: Interactive OAuth account linking via BFF callback at /api/auth/slack-link (FR-025).
Rationale: The BFF already has NextAuth/Keycloak integration. The flow:
- Bot generates linking URL with single-use nonce (10min TTL) +
slack_user_id - User clicks URL → redirected to Keycloak OIDC login (via federated IdP)
- BFF callback handles auth code exchange → extracts
keycloak_sub - BFF stores
slack_user_idas Keycloak user attribute via Admin API - BFF posts confirmation DM via Slack Web API
- Subsequent commands: bot queries Keycloak Admin API (find user by
slack_user_idattribute) → OBO exchange
Nonce storage: MongoDB slack_link_nonces collection (ephemeral, 10min TTL index).
Alternatives considered:
- Slack bot hosts its own HTTP server for callback — rejected; adds HTTP capability to Python bot, duplicates BFF
- Store link mapping in MongoDB instead of Keycloak — rejected; centralizes identity in Keycloak, removes MongoDB dependency from Slack bot identity path
R-06: Supervisor Test Failures (217 failures)
Decision: Install pytest-asyncio as dev dependency; most failures are missing async test support.
Rationale: The error message shows Failed: async def functions are not natively supported. You need to install a suitable plugin for your async framework, for example: pytest-asyncio. This is a dependency gap, not test logic bugs. After installing, remaining failures (if any) will be individual test issues.
Additional findings:
pytest-covis not installed — needed for coverage measurement- Root
tests/directory is NOT inpyproject.tomltestpaths— tests only run when explicitly invoked viapytest tests/ - The
pyproject.tomltestpathsonly listsai_platform_engineering/utilsandai_platform_engineering/multi_agents
Action: Add pytest-asyncio, pytest-cov to dev dependencies; add tests/ to testpaths.
R-07: UI Test Failures (218 failures across 25 suites)
Decision: Triage and fix by category — most failures are likely mock/import issues from recent code changes.
Rationale: With 1,876 passing tests and 218 failing, the failures are concentrated in 25 of 105 suites. Common causes in Next.js test suites:
- Module mock mismatches after refactoring (e.g.,
canViewAdminremoval) - Async timing issues in
waitForassertions - Missing mock providers or changed API signatures
Action: Run each failing suite individually, categorize failures, fix in batches. Priority: API tests (admin, auth) → component tests → hook tests.
R-08: User Self-Service RBAC Posture View
Decision: New API route /api/auth/my-roles + read-only panel in user menu (FR-036).
Rationale: The admin user detail modal (FR-033) already aggregates realm roles, teams, per-KB roles, per-agent roles, and IdP source. The self-service view reuses the same data fetching but scoped to the authenticated user's own keycloak_sub. No Keycloak Admin API access needed for the frontend — the BFF route fetches from Keycloak Admin API server-side and returns only the current user's data.
Data sources:
- Keycloak Admin API: realm roles, per-KB/agent roles (parsed from role names), IdP source, account status
- MongoDB: team memberships (from team documents)
- Session: email, name, sub (already available)
Alternatives considered:
- Extract from JWT claims only — rejected; JWT doesn't contain team memberships or per-KB/agent role details
- Link to Keycloak Account Console — rejected; poor UX, requires separate login, doesn't show CAIPE teams