Skip to main content

RBAC Workflows

Sequence diagrams and flow narratives for "what happens when X". Pair this with Architecture (which describes each component) β€” this doc is about how those components interact over time.

If you only have 5 minutes, read Per-request authorization β€” it's the most important diagram in CAIPE.


Login + First-Time Broker Login​

This is the once-per-session flow. After it completes, the user holds a Keycloak-backed UI session and usually never sees Keycloak (or the upstream IdP β€” Okta / Duo SSO / etc.) again until the Keycloak SSO session can no longer be refreshed.

The default Keycloak "first broker login" flow shows a "Review Profile" page and, if a local account with the same email already exists, a "Confirm Link Account" page. Both are eliminated by the custom flow patched in by init-idp.sh:

caipe-silent-broker-login  (both executions: ALTERNATIVE)
β”‚
β”œβ”€β”€ idp-create-user-if-unique
β”‚ Condition: no local user with this email exists
β”‚ Action: provision new Keycloak user, assign default roles
β”‚
└── idp-auto-link
Condition: local user with matching email already exists
Action: link external identity to existing account silently

This only works correctly because trustEmail=true is set on the IdP. That flag tells Keycloak to treat the email claim from the upstream IdP (Okta, Duo SSO, Azure AD, …) as authoritative for account matching.

Production installs should also keep keycloak.idp.forceRedirect=true (exported to KEYCLOAK_FORCE_IDP_REDIRECT=true). That makes the app realm's browser flow require the IdP redirector and disables the local Keycloak username/password form, so users go to the enterprise IdP even if a client does not send kc_idp_hint.

The other side of the kc_idp_hint contract is on the UI: ui/src/lib/auth-config.ts only spreads kc_idp_hint=${OIDC_IDP_HINT} into the NextAuth authorization params when OIDC_IDP_HINT is set and non-empty. Two tests pin this contract end-to-end:

  • ui/src/lib/__tests__/auth-config.test.ts (describe OIDC kc_idp_hint forwarding) asserts the hint is forwarded verbatim when configured and is omitted when the env var is unset or empty.
  • tests/integration/test_keycloak_idp_hint_redirect.sh boots a throwaway Keycloak, runs init-idp.sh, and asserts GET /realms/caipe/protocol/openid-connect/auth 302/303s to /broker/${IDP_ALIAS}/login with no hint, with a valid hint, and degrades gracefully (no 5xx) with an unknown hint. Both are gated in CI via the idp-hint-test job in .github/workflows/ci-keycloak-init.yml (also runnable locally via make test-keycloak-idp-hint or make test-keycloak-sso-all). See Secrets bootstrap β†’ SSO bootstrap β€” kc_idp_hint and the IdP redirector for the long-form walkthrough.

Security implication: if the upstream IdP can be compromised to issue arbitrary email claims, an attacker could link to any existing account. This is acceptable here because Okta and Duo SSO (and other supported IdPs) are corporate SSO providers β€” trust in the email claim is the same as trust in the IdP.

The complete one-time login sequence (Browser β†’ Keycloak β†’ upstream IdP β†’ Keycloak β†’ CAIPE UI) is shown inline in Per-request authorization below β€” look for the "One-time login path" rectangle. With the default realm policy, active users refresh silently through Keycloak for up to the configured SSO idle and max lifetimes.


Per-Request Authorization (End to End)​

This is the RBAC sequence diagram. It traces a single Slack message ("list my ArgoCD apps") all the way through OBO token exchange, supervisor middleware, AgentGateway extAuthz / OpenFGA evaluation, and into the MCP server. JWKS refresh and one-time login timelines run alongside the hot path and the diagram shows how they converge.

Read this diagram as four independent timelines that happen to converge:

  1. Policy timeline β€” admins change ReBAC relationships through the OpenFGA/ReBAC UI and team resource APIs. Those writes update MongoDB provenance and OpenFGA tuples; AgentGateway does not maintain a CEL policy CRUD surface or Mongo-backed config bridge.
  2. Key timeline β€” Keycloak publishes its signing keys on a public endpoint. AG fetches them lazily (startup, TTL expiry, or unknown kid). Keycloak is not a runtime dependency of AG β€” requests succeed even if Keycloak is briefly unreachable, as long as the cached JWKS has a valid key for the JWT's kid.
  3. Login timeline β€” Duo SSO authenticates the human at the start of the Keycloak SSO session. Keycloak exchanges that Duo assertion for CAIPE tokens; the UI keeps refreshing 1-hour access tokens through Keycloak while the 8-hour idle / 24-hour max SSO session remains valid. CAIPE UI keeps large OAuth tokens in a server-side token cache and only stores slim session metadata in the httpOnly cookie. If that cache is lost after a UI restart, RAG and token-enforced data-plane calls redirect through login, while Dynamic Agents browser proxy routes can still forward the signed-in X-User-Context fallback for configuration and save flows; AgentGateway-backed MCP probes/tools may still require a fresh Keycloak bearer. On every successful CAIPE login, the BFF reconciles OpenFGA tuples for users who passed OIDC_REQUIRED_GROUP; those tuples come from the admin-managed default OpenFGA grant profile bundle when present, otherwise the built-in Org Member / Org Admin defaults. Users in OIDC_REQUIRED_ADMIN_GROUP or BOOTSTRAP_ADMIN_EMAILS receive the selected admin profile grants. Team-assigned custom profiles override the global member/admin profile for matching team users and are materialized as direct user tuples; multiple team overrides union with each other. Admin changes in Security & Policy β†’ OpenFGA β†’ Default FGA Grants can save future-login templates and reconcile all known users immediately. If enabled, CAIPE also uses the login-time memberOf / groups claims to reconcile only the signed-in user's managed team memberships. This claim path is additive; full inventory, removals, and drift still come from direct Okta/AD API sync. Duo is not on the request hot path β€” it is only touched on login or when Keycloak/upstream IdP policy requires interactive reauthentication. AgentGateway only needs to understand the Keycloak-signed JWT and the OpenFGA decision.
  4. Request timeline β€” the OBO JWT carries the user's identity and roles end-to-end. The same token is verified by AG (edge) and optionally re-verified by the MCP server (depth). This is deliberate: a compromised AG doesn't let tokens past MCP without signature check.

Demo tip: when presenting this diagram live, start by highlighting the Login timeline and note "this happens once, then Keycloak refreshes the CAIPE access token while the SSO session is active". Then trace through the Request timeline and ask the audience where Duo appears β€” the answer is nowhere, because every downstream check uses the Keycloak-signed JWT. This is the clearest way to explain why CAIPE can swap IdPs without touching agent code.


Dynamic Agent Invocation​

Dynamic Agent start, invoke, resume, and cancel requests have two authorization layers. The Web UI backend blocks denied callers before any backend proxy call by checking agent use plus conversation write access. Conversation write uses the hybrid model: implicit MongoDB ownership (owner_subject or legacy owner_id) is accepted for private conversations, while shared or delegated writes fall through to explicit OpenFGA conversation:<id> relationships. The Dynamic Agents runtime repeats the agent-use check before agent lookup or runtime work.

The same sequence applies to POST /api/v1/chat/invoke, POST /api/v1/chat/stream/resume, and POST /api/v1/chat/stream/cancel (cancel does not start runtime work, but it still requires agent use and conversation write authorization). The RBAC Audit tab surfaces Web UI backend and Dynamic Agents OpenFGA decisions as OpenFGA ReBAC rows with pdp=openfga and the checked tuple in resource_ref. MongoDB audit_events is authoritative for compliance and history; Jaeger/OTel can still be enabled for request-flow debugging, but the Admin UI does not need it to show authz decisions.

Slack follow-up bookkeeping uses PATCH /api/chat/conversations/[id]/metadata after a response is posted. That endpoint uses the same implicit-owner-or-explicit conversation write check, so a Slack OBO token for the conversation owner can update thread metadata such as last_processed_ts without a separate conversation:<id>#writer tuple.

Self-Service Resource Creation​

Private and team-scoped Dynamic Agents, MCP servers, and RAG data sources use the same OpenFGA-backed create flow. MongoDB persists the resource document, while OpenFGA is the PDP for who can see, use, or manage it.

For private resources, the creator's direct owner tuple derives management rights. For team resources, team members get use/read access and team admins get the manager relationship. Team membership and team-admin status are evaluated by OpenFGA checks; Mongo team fields are metadata and compatibility context, not the primary authorization decision.

Credential OAuth Connector Flow​

The Connections & Secrets OAuth connector flow is a CAIPE credential-exchange flow, not a Keycloak login broker flow. Provider client IDs/secrets are seeded from .env in Docker Compose or ESO in Kubernetes into encrypted MongoDB connector records. Users then create or relink per-provider connections from the Connections page. The browser navigates in the same tab so the OAuth callback keeps the signed-in CAIPE session context:

The /credentials page is feature-flagged by CAIPE_CREDENTIALS_ENABLED and then gated by OpenFGA organization membership (can_use organization:<org_key>). The Admin β†’ Settings β†’ Credentials tab is separately feature-flagged and visible only for organization admins (can_manage organization:<org_key>), even if a non-admin has read-only baseline admin surface grants.

The browser never receives provider tokens or decrypted secret material. Local development may use http://localhost redirect URIs, but production connector redirect URIs must use HTTPS. The final callback page includes a return link and still broadcasts a connection event for tabs that are listening. Built-in GitHub and Webex connector bootstrap normalizes legacy local http://localhost:3001/oauth/{provider}/callback values to the CAIPE UI callback route at /api/credentials/oauth/{provider}/callback, so the provider returns to the BFF route that stores the encrypted token set.

After a connection exists, the Connections page can run Check GitHub Profile, Check Atlassian Profile, Check Webex Profile, or Check PagerDuty Profile. The browser calls the BFF profile-check route for its own connection id; the BFF verifies the session, loads only connections owned by the signed-in Keycloak sub, refreshes the provider token server-side, calls the provider profile endpoint, and returns a small redacted profile summary. Atlassian checks also fall back to /oauth/token/accessible-resources when the User Identity /me endpoint returns 403, so operators can distinguish a valid OAuth grant from a denied profile API. The route also returns a redacted diagnostics checklist for the Connections page modal: connection ownership, refresh-token acceptance, provider profile status, and Atlassian accessible-resource/scope status where applicable. Each diagnostic includes operator guidance such as relinking the provider or asking an Atlassian admin to review User Identity API access. The route never returns the OAuth access or refresh token.

The Connections page also performs an automatic, browser-safe refresh pass on load for connected providers whose access token is expired or within the refresh threshold. The BFF POST /api/credentials/connections/{id}/refresh route verifies the same session ownership, refreshes the provider token server-side, persists the new encrypted access-token reference/expiry metadata, and returns only refresh metadata (ok, provider, and expiry interval), never token material.

Runtime callers that need a provider access token use POST /api/credentials/exchange instead. That route is non-browser only: it rejects Origin/Referer/cookie requests, validates the service bearer JWT, checks the expected credential-service audience header, and supports either an explicit provider_connection_id or a provider key such as atlassian. Provider-key exchange selects the connected provider record owned by the JWT sub, so a Dynamic Agent invocation receives the signed-in user's Atlassian token without hard-coding a per-user connection id. Explicit connection-id exchange still requires either ownership by JWT sub or OpenFGA secret_ref:provider_connection:<id>#can_use before returning a refreshed provider access token.

For Jira MCP, Dynamic Agents keeps the user's Keycloak JWT on Authorization for MCP authentication and injects the exchanged Atlassian OAuth token on X-CAIPE-Provider-Token. Jira treats that header as a provider Bearer token and does not require ATLASSIAN_EMAIL for that OAuth path; static API-token Basic auth remains available when impersonation tokens are disabled.

GET /api/credentials/inject/atlassian is implemented as the future BFF contract for AgentGateway-style provider-token injection, but AgentGateway v0.12 does not support backend-level HTTP extAuthz response-header injection. Until that gateway capability exists, the active Jira path keeps the exchange in the connector/runtime layer: Dynamic Agents resolves the user-specific Atlassian token through credential exchange and Jira MCP consumes it from X-CAIPE-Provider-Token.

Webex Space ReBAC and Bot Dispatch​

Webex follows the Slack bot trust model with Webex spaces in place of channels. The bot treats Webex as an external event source, not as an identity provider: every protected message must map to a Keycloak user, a CAIPE team, an OpenFGA-backed space route, and a user/resource allow decision before dispatch.

Failure categories are explicit and fail closed: WEBEX_USER_NOT_LINKED, WEBEX_WORKSPACE_UNCONFIGURED, WEBEX_SPACE_TEAM_NOT_FOUND, WEBEX_OBO_FAILED, WEBEX_ROUTE_DENIED, missing_space_grant, and pdp_unavailable. Audit records use component=webex_bot and hash Webex person IDs before logging.

WEBEX_USER_NOT_LINKED is handled privately by default. In a group space, the bot sends only a generic thread notice and delivers the signed SSO link in a 1:1 Webex Adaptive Card addressed to the requesting personId. If the 1:1 send fails, the fallback message asks the user to open the app and retry linking without exposing the signed URL in the shared room. Slack-style implicit Webex profile linking remains an explicit user-choice path and requires strict Webex org, verified-email, no-conflict, and audit checks before it can bind webex_user_id without an SSO click.

For Webex spaces, the raw room UUID is the policy identifier in webex_space:<alias>--<space>. Public Webex room IDs are decoded from ciscospark://us/ROOM/<uuid> before MongoDB/OpenFGA lookups and re-encoded only for outbound Webex API calls.

Team Creation OpenFGA Sync​

When an admin creates a team through POST /api/admin/teams, the Web UI backend synchronizes three pieces of state in one shot β€” Mongo teams, Mongo team_membership_sources, and OpenFGA (membership tuples). The OpenFGA write is what makes team:<slug>#can_use resolve true for the creator on subsequent requests like Dynamic Agent creation. Skipping the OpenFGA step leaves team:<slug>#can_use false even though Mongo has the membership row, and OWNER_TEAM_FORBIDDEN fires on the very next agent-creation API call.

Phase 3 of spec 2026-05-24-derive-team-from-channel removed the per-team Keycloak client scope (team-<slug>). Team creation no longer touches Keycloak; teams are a pure Mongo+OpenFGA concept.

Why the creator gets both admin and member tuples even though admin alone would satisfy can_use (model: can_use = member βˆͺ admin):

  1. The team Members tab in the Admin UI reads the Mongo members[] array verbatim. If the creator is only stored as role: 'owner', the tab continues to show them as the only member, which matches the visible Mongo intent.
  2. The redundant member tuple keeps the OpenFGA store self-describing β€” a future read of team:<slug>#member returns every human-or-admin member, not just the team admins.
  3. It costs one extra tuple per creator and removes a class of "I'm an admin, why don't list endpoints that filter by team#member include me?" bugs.

The same helpers (resolveKeycloakUserSubject and writeTeamMembershipTuples in ui/src/lib/rbac/team-membership-sync.ts) are used by POST /api/admin/teams/[id]/members so the add-member path is symmetric: adding a member writes a member tuple, adding an admin writes an admin tuple, and removing the last source for a relation deletes the corresponding tuple.

Team OpenFGA Sync Diagnostic​

Even with the team-creation sync wired up correctly, a team can drift out of step with OpenFGA over time: someone wrote a Mongo source row before the user had logged in to Keycloak (so we couldn't yet resolve their sub), the OpenFGA store was rebuilt without replaying tuples, or a legacy team was created before the sync helpers existed. The Teams settings dialog now surfaces drift directly to the admin instead of hiding it behind backend logs.

The diagnostic reports four states per source row:

StateMeaningAdmin action
syncedSource row has user_subject AND OpenFGA contains the matching user:<sub>#<relation> team:<slug> tupleNone
pendingSource row exists but user_subject is empty (e.g. the user has never signed in to Keycloak yet)Wait for first sign-in, or click Reconcile once Keycloak knows the user
drifteduser_subject is resolved but OpenFGA is missing the matching tupleClick Reconcile
unknownOpenFGA read failed or the store is unconfiguredCheck OpenFGA health in Security & Policy

needs_attention on the summary is true if any row is drifted or unknown. pending does not flip the banner red β€” it's an informational state, not a failure mode.

The Reconcile endpoint is intentionally idempotent: write-on-already-present is a no-op at the OpenFGA layer, so it is safe to invoke repeatedly. It returns unresolved_emails for any source rows whose subject could not be re-resolved (e.g. an invitee's Keycloak account still does not exist), so the admin can chase those manually rather than spinning on the button.

Dynamic Agent Creation Ownership​

New Dynamic Agents must be assigned to an owner team during creation. The Web UI backend validates the selected team before writing any agent document: platform admins can choose any team, while scoped team admins can choose only teams where they are admin or owner.


AgentGateway MCP Endpoint Routing​

Invariant. Every MCP server routed through AgentGateway must persist an endpoint of the form {agentgateway_base}/mcp/<server_id>. AgentGateway dispatches by path prefix (/mcp/<target>); a bare {agentgateway_base}/mcp falls through to a non-registered route and returns HTTP 404 Not Found on every probe and tool call. The class first surfaced in production as the Confluence card showing:

Failed to connect to MCP server: HTTP 404 Not Found from http://agentgateway:4000/mcp

Defence in depth. The invariant is enforced in four places β€” any one of them is sufficient on its own, but all four together mean a bad endpoint cannot persist for long:

LayerWhat it doesCode
Save-side normaliser (BFF)POST/PUT /api/mcp-servers rewrites a bare gateway URL to /<server_id> form before insert/update β€” prevents future driftui/src/lib/rbac/mcp-endpoint-normalizer.ts, ui/src/app/api/mcp-servers/route.ts
Editor picker (UI)MCPServerEditor calls /api/mcp-servers/agentgateway/discover on open and offers a Pick AgentGateway target row that fills the endpoint with the canonical /<id> formui/src/components/dynamic-agents/MCPServerEditor.tsx
Read-side self-heal (runtime)build_mcp_connection_config in dynamic-agents re-normalises against AGENT_GATEWAY_URL before handing the URL to the MCP transport, so legacy rows still work until repairedai_platform_engineering/dynamic_agents/src/dynamic_agents/services/mcp_client.py, services/mcp_endpoint_normalizer.py
Standalone config reconciliation (Docker Compose)agentgateway-config-bridge polls MongoDB for enabled AgentGateway-managed mcp_servers rows, renders one hot-reloaded standalone route per server, and writes the generated config volume consumed by AgentGatewaydeploy/agentgateway/config_bridge.py, deploy/agentgateway/Dockerfile.config-bridge, docker-compose.dev.yaml
Native Kubernetes routing (Helm)The umbrella chart renders AgentGateway-native AgentgatewayBackend and HTTPRoute resources for the built-in Knowledge Base target and any configured global.agentgateway.extraMcpTargetscharts/ai-platform-engineering/templates/agentgateway-mcp.yaml, charts/ai-platform-engineering/values.yaml
One-shot repair scriptscripts/fix-mcp-endpoint-routing.ts audits the mcp_servers collection (dry-run by default) and rewrites mis-shaped rows under --applyscripts/fix-mcp-endpoint-routing.ts

Direct upstream URLs are never rewritten. AgentGateway routing is opt-in per server, and silently rewriting http://mcp-confluence:8000/mcp would break stdio and in-cluster topologies. The normaliser detects gateway endpoints by origin match against the configured AGENT_GATEWAY_URL; anything else passes through unchanged.

Config-driven rows are never rewritten by the repair script. Their source of truth is config.yaml. If a config-driven row is mis-shaped, the script logs it under untouchedConfigDriven so an operator can fix the YAML instead.

Operator workflow for the repair script:

# Dry-run β€” prints which rows would change, no Mongo writes.
MONGODB_URI=mongodb://... \
AGENT_GATEWAY_URL=http://agentgateway:4000 \
npx ts-node scripts/fix-mcp-endpoint-routing.ts

# Apply the repairs (idempotent β€” re-running is a no-op).
MONGODB_URI=mongodb://... \
AGENT_GATEWAY_URL=http://agentgateway:4000 \
npx ts-node scripts/fix-mcp-endpoint-routing.ts --apply

The dry-run output includes a reason for each candidate (bare_gateway_base, gateway_root_only, wrong_target_suffix) so the admin can sanity-check the proposed change before committing.

Testing the repair script: the pure helpers (normalizeMcpEndpointForServer, buildRepairPlan) are covered by scripts/__tests__/fix-mcp-endpoint-routing.test.ts and run with the same invocation as the other scripts/__tests__/*.test.ts files:

npx ts-node --compiler-options '{"module":"CommonJS"}' \
scripts/__tests__/fix-mcp-endpoint-routing.test.ts

The Mongo IO half of the script (main() / MongoClient.connect()) is left to live verification because it needs a real database; the pure helpers cover every classification path (bare_gateway_base, gateway_root_only, wrong_target_suffix), the safety rules (direct upstream / config-driven / no _id / no endpoint), and the customisable AgentGateway base URL.


0.5.1 Schema Migration Tab​

Admins run release migrations from Admin β†’ System β†’ Migrations. The tab loads the 0.5.1 migration manifest, lets the admin select and dry-run each migration, and requires typing the exact confirmation string before applying writes.

init-idp.sh remains the first-run bootstrap escape hatch because it runs before the Web UI backend is healthy and can use direct Keycloak admin credentials. It prevents a chicken-and-egg dependency where BFF startup needs Keycloak client/realm state that only BFF startup could create.

After that bootstrap, the Web UI backend owns the long-term Keycloak reconciliation path through keycloak_rbac_mapping_reconciliation_v1. This migration is code-backed in TypeScript rather than shell-backed by init-idp.sh; on BFF startup it reconciles bot OBO permissions (token-exchange decision strategy, service-account impersonation roles, realm-level users.impersonate scope-permission), resolves BOOTSTRAP_ADMIN_EMAILS to Keycloak user ids, creates passwordless verified placeholders for bootstrap emails that have not logged in, writes durable OpenFGA admin tuples, records the run in Mongo migration tables, and leaves a blocking migration status if the Keycloak repair fails. (Phase 3 of spec 2026-05-24-derive-team-from-channel removed the per-team and personal client-scope branches from this migration; teams no longer touch Keycloak.) The header checks GET /api/rbac/migration-status for every authenticated UI session so non-admin users see the same "migrations required" indicator. Admins can inspect persisted Keycloak run details, counts, warnings, and errors from GET /api/admin/keycloak/migration-health in Admin β†’ Security & Policy β†’ Keycloak. The panel surfaces five high-signal tiles at the top (Schema area / Version / Migration status / Last actor / Bootstrap admins) and the Keycloak Invariants section below them with per-row Fix buttons as the actionable source of truth for OBO token-exchange permission strategy, attached OBO policies, and service-account impersonation roles. Bootstrap-admin diagnostics (configured emails, resolved Keycloak subjects, placeholder creations, tuple writes, per-email warnings) are still inspectable through the Bootstrap admins tile at the top of the panel. If the stored run is failed or the keycloak_rbac_mappings schema area is behind, the Reconcile now button posts to the existing migration apply route for keycloak_rbac_mapping_reconciliation_v1 and then reloads the health panel from Mongo.

Dynamic Agent migrations include both tool tuple reconciliation and agent_org_admin_inheritance_v1, which backfills organization:<org>#admin manager agent:<id> for existing agents so organization admins inherit can_manage without assigning owner teams to legacy records.

Conversation authorization after the migration remains hybrid: if the caller owns the conversation by owner_subject or legacy owner_id, the Web UI backend allows the private owner path without a per-conversation OpenFGA owner tuple. Non-owners must still pass explicit OpenFGA checks for shared conversation access.


OBO Token Exchange β€” Bot Identity Propagation​

Badge analogy: The Slack or Webex bot is a courier service. When Alice asks the courier to pick something up from the server room on her behalf, the courier can't use their own badge β€” the server room requires Alice's clearance. Instead, the courier goes to HR (Keycloak), presents their credentials and Alice's employee ID, and HR issues a delegated badge: it opens the same doors as Alice's badge, but it has a second chip that says "issued on behalf of Alice, presented by courier bot." The delegation chain is physically stamped on the badge β€” it's auditable and unforgeable.

The hardest part to get right technically. Without OBO, every Slack or Webex request carries the bot's service account identity. OpenFGA would evaluate the bot instead of the human, and all per-user/team authorization would be meaningless.

RFC 8693 Token Exchange​

OBO (On-Behalf-Of) is implemented via RFC 8693 token exchange. The bot uses its client_credentials grant to request a token impersonating a specific Keycloak user:

POST /realms/caipe/protocol/openid-connect/token
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&client_id=slack-bot
&client_secret=<bot-secret>
&subject_token=<bot-access-token>
&subject_token_type=urn:ietf:params:oauth:token-type:access_token
&requested_subject=<keycloak-user-id>
&requested_token_type=urn:ietf:params:oauth:token-type:access_token
&audience=${CAIPE_PLATFORM_AUDIENCE:-caipe-platform}
&scope=openid team-<slug-or-personal>

Keycloak responds with an OBO JWT where:

  • sub = the impersonated user's Keycloak ID
  • email = the user's email
  • act.sub = the bot's client ID β€” the delegation chain is cryptographically recorded
  • aud includes caipe-platform by default because the bot's immediate next hop is the CAIPE UI BFF access-check/proxy surface, not AgentGateway

Phase 3 of spec 2026-05-24-derive-team-from-channel removed the legacy active_team JWT claim. The bot no longer requests a team-* scope and the OBO token no longer carries a team identifier. Team identity for a Slack channel or Webex space is now derived at request time from channel_team_mappings / webex_space_team_mappings (see "Channel-message dispatch" below). DM dispatch follows the personal chain (override β†’ preference β†’ dm_agent_id β†’ default_agent_id β†’ deny).

Bot β†’ BFF Audience​

Slack and Webex use the same audience model. The bot mints a team-agnostic user OBO token for the next hop it is calling: the CAIPE UI BFF. That is why CAIPE_PLATFORM_AUDIENCE defaults to caipe-platform. AgentGateway still accepts agentgateway for direct data-plane callers and legacy paths, but bot pre-dispatch checks should not mint aud=agentgateway.

The two load-bearing invariants are:

  1. Audience follows the next hop. Bot pre-dispatch calls target the CAIPE UI BFF, so OBO uses CAIPE_PLATFORM_AUDIENCE (caipe-platform by default). The same bearer can still be forwarded later because Dynamic Agents and AgentGateway accept caipe-platform.
  2. Team context is data-layer derived, not JWT-signed (Phase 3 of spec 2026-05-24-derive-team-from-channel). Channel/space β†’ team mapping is read from MongoDB at every request, and the BFF + AgentGateway PDP evaluate the OpenFGA decision against that mapping. The OBO token is team-agnostic.

Sharing model: assigning a channel to a team transitively shares its agents​

Channel-dispatch authorization deliberately uses the channel's mapped team as the user-side subject of the can_use agent:<id> check (team:<slug>#member can_use agent:<id>). This is stronger than a direct per-user grant on the user β†’ agent edge, because the OpenFGA model lets a user reach the agent through any team they belong to that has the grant.

Operationally that means:

  • Assigning channel C to team:T and then sharing any agent A with C (via the channel's can_use agent tuple) also makes A callable in C by every member of team:T, including members who were never granted A directly.
  • Removing the channelβ†’team assignment, or unsharing the agent from the channel, revokes that transitive access immediately on the next request.
  • A DM with the same user does not inherit this channelβ†’team cascade on its own β€” DM dispatch uses user:<sub> can_use agent:<id> and ignores channel/team mappings. However, the DM check does fall back to a team-union OpenFGA evaluation against existing team:<slug>#member can_use agent:<id> tuples (see evaluateAgentAccess), so any agent explicitly shared with a team via the Agent editor (next section) is callable in DM by every member of that team.

If an agent must stay private to a subset of a team, do not pin it to a channel that is mapped to that team. Either:

  1. Share the agent with a smaller team (or with individual users) and keep the channel mapped to the broader team for other agents, or
  2. Map the channel to a narrower team whose membership matches the intended audience for that agent.

The admin UI (Slack channel and Webex space ReBAC panels) surfaces this trade-off in the top-of-card "Sharing model" callout and in a per-channel heads-up under the agent-association form. Future work may add an optional per-channel agent allow-list that is stricter than the team-level grant; until then, the team cascade is the canonical policy and is documented behavior, not a bug.

Sharing model: explicit "Share with Teams" on an agent​

The Agent editor (DynamicAgentEditor) has a "Share with Teams" multi-select that operates on the same two-tuple inheritance pair as the owner team, but additively β€” selecting a team T writes team:T#member can_use agent:<id> and team:T#admin can_manage agent:<id> to OpenFGA without disturbing the owner-team tuples. The practical consequence is:

  • Every member of team T can DM the agent in a 1:1 chat (because the DM dispatch's team-union fallback resolves user:<sub> β†’ team:T#member β†’ can_use agent:<id>).
  • Every member of team T can use the agent in any Slack channel or Webex space whose channel_team_mappings/webex_space_team_mappings row points at team T (because channel dispatch evaluates team:T#member can_use agent:<id> directly).
  • Every admin of team T inherits can_manage on the agent and can edit, disable, or delete it from the admin surfaces.

Removing a team from the multi-select on the editor is symmetric: POST/PUT /api/dynamic-agents walks the previous shared_with_teams list against the new one and emits OpenFGA delete tuples for every removed slug (via previousSharedTeamSlugs on reconcileAgentRelationships). Until 2026-05-27 this field was Mongo-only β€” the multi-select silently denied access β€” see agent_shared_team_grants_backfill_v1 for the one-shot replay that fixes existing agents.

The "Effective access" callout under the multi-select is the operator-facing render of exactly which team:<slug>#member tuples the next save will write to OpenFGA, so admins can confirm the transitive grant before the form is submitted.

/use default workflow (DM personal default)​

/caipe-use default <agent_id> and /caipe-use default (no agent) update a single per-user preference (dm_default_agent_id) in one round-trip. The bot resolves the agent (or null), checks the user can can_use it (when setting), then writes the preference. The next DM dispatches via the personal chain and lands on the new default.

Both the override (live DM dispatch) and the preference (next-DM default) are cleared in a single round-trip when the user passes the bare form, matching FR-029a in spec 2026-05-24-derive-team-from-channel.

Security Properties of OBO​

PropertyMechanism
Bot cannot forge a user identityKeycloak only issues the OBO token if the bot's client_id has the token-exchange permission granted in the realm
Delegation is auditableact.sub in the JWT records the bot as delegating party β€” verifiable in any JWKS-aware system
User/team relationships are enforced, not bot identityOpenFGA checks use the impersonated user's sub and team relationships from the OBO token context
Token expiry still appliesOBO tokens have the same exp as a normal Keycloak token; expired tokens are rejected at every JWKS validation point
Unlinked users are blocked at the edgerbac_global_middleware in the Slack bot rejects unlinked users before they reach the supervisor β€” the linking prompt is sent at most once per SLACK_LINKING_PROMPT_COOLDOWN seconds (default: 3600)

There are three onboarding paths, in priority order: (1) auto-link to existing Keycloak user, (2) JIT-create a new shell user (spec 103), (3) HMAC-signed link URL as fallback.

1. Auto-bootstrap (default, SLACK_FORCE_LINK=false)​

On the user's first Slack message the bot:

  1. Calls Slack users.info β†’ fetches profile.email
  2. Queries Keycloak Admin API for a user with that exact email
  3. If found: writes slack_user_id attribute β†’ linked silently, zero user action required
  4. If not found: the bot continues to step 2 (JIT) below.

2. Just-In-Time user creation (default ON, SLACK_JIT_CREATE_USER=true)​

When no existing Keycloak user matches the Slack email, and JIT is enabled, the bot:

  1. Optionally checks the email domain against SLACK_JIT_ALLOWED_EMAIL_DOMAINS (comma-separated allowlist; empty = any domain).
  2. POSTs to /admin/realms/{realm}/users using the same KEYCLOAK_SLACK_BOT_ADMIN_* credentials (caipe-platform service account, holds realm-management:{view-users, query-users, manage-users} for this path).
  3. The created user is federated-only: no password, no required actions, emailVerified=true, with attributes slack_user_id, created_by=slack-bot:jit, created_at=<RFC3339>.
  4. Race-safe: an HTTP 409 from a concurrent create is resolved by re-querying the email and returning the surviving UUID.
  5. On failure (4xx/5xx/network), the bot logs event=jit_failed error_kind=<auth_failure|forbidden|server_error|network_error|unexpected> and falls through to step 3.

JIT is default ON in dev so first-time DMs work without an admin handshake. Set SLACK_JIT_CREATE_USER=false in production if you want web-UI onboarding to be a hard prerequisite β€” in which case all unknown emails go to the link URL below.

Single-credential design (spec 103, plan R-8). JIT deliberately reuses the existing caipe-platform admin client rather than introducing a separate caipe-slack-bot-provisioner. This trades strict privilege separation (one secret can both read and create users) for operational simplicity (one Secret to manage, one rotation procedure, one audit identity). Compensating mitigations: only the create_user_from_slack helper writes /users; all JIT actions are logged with stable event=jit_* tokens for SIEM. The same service account also holds client/authz realm-management roles because the Web UI BFF Keycloak RBAC migration must inspect and repair OBO clients and scope permissions.

Whenever auto-link returns no user and JIT is disabled / domain not allow-listed / JIT failed, the bot DMs an HMAC-signed URL:

/api/auth/slack-link?slack_user_id=U09TC6RR8KX&ts=1713196400&sig=<HMAC-SHA256>

The HMAC signature uses SLACK_LINK_HMAC_SECRET, prevents forged links, and is time-bound (TTL enforced server-side). After OIDC login, the server writes slack_user_id to the Keycloak user via the Admin API.

The user always gets an actionable path forward β€” the previous "contact your admin" dead-end was removed in spec 103 (FR-007).

In all three modes, once the link is established, all future Slack messages carry the user's Keycloak identity automatically β€” no repeated login.

Privacy in logs​

All log lines that reference a Slack profile email run it through mask_email() (spec 103 FR-010): alice@corp.com β†’ ali***@corp.com. The domain stays visible for SIEM tenant attribution; the local part is redacted.


Slack Channel β†’ Team + Agent ReBAC​

Badge analogy: Each Slack channel is a dedicated help-desk line. An admin assigns the line to a team and grants one or more Dynamic Agents to that line. When a user calls in, the operator checks both the channel grant and the user's team/agent relationship before patching them through.

How It Works​

Slack channel routing now separates "which team owns this channel?" from "which Dynamic Agents may be used here?" The workspace key is a configured alias (SLACK_WORKSPACE_ALIAS, for example CAIPE) rather than Slack's opaque team_id; the Slack bot maps incoming team_id values to that alias before looking up routes or grants. When a message arrives, the Slack bot reads OpenFGA tuples for slack_channel:<workspace_alias>--<channel_id> user agent:<id>, then joins optional slack_channel_agent_routes metadata for listen mode and priority. Stale Mongo route rows without a matching OpenFGA tuple are ignored. Operators can set config for static-only routing or db_only to use only UI-managed OpenFGA-backed routes. The selected agent is then verified against OpenFGA:

  1. Team lookup: query channel_team_mappings in MongoDB by slack_channel_id.
  2. Optional first-message auto-assignment: when SLACK_AUTO_ASSIGN_UNMAPPED_CHANNELS=true and no active mapping exists, write the configured SLACK_DEFAULT_TEAM_SLUG mapping, the default slack_channel:<workspace_alias>--<channel_id> user agent:<id> OpenFGA tuple, and matching route metadata.
  3. Team-agnostic OBO mint: mint the user's OBO token without a team-* scope (Phase 3 of spec 2026-05-24-derive-team-from-channel removed the per-team scope mint — channel→team is now resolved from channel_team_mappings at every request).
  4. Channel association lookup: read OpenFGA channel-agent tuples and join Mongo route metadata only for tuple-backed agents.
  5. Channel ReBAC check: call the Slack channel access checker for slack_channel:<workspace_alias>--<channel_id> can_use agent:<id> and the user's team/agent relationship (team derived from the channel mapping).
  6. Route: dispatch to the selected agent_id only after both the channel association and user/team agent grant allow the request.

The Slack YAML config still registers channels and remains the fallback route source in the default db_prefer mode. Runtime channel-agent authorization lives in OpenFGA; Mongo route rows are non-authoritative metadata and are deleted when the admin deletes the channel-agent association. The OpenFGA Policy Graph overlays channel_team_mappings as read-only assigned_team routing metadata edges so operators can see channel ownership next to OpenFGA grants without treating that ownership as a mutable tuple.

The Slack Channels admin panel also includes Slack Runtime Diagnostics for the selected channel. It calls /api/admin/slack/channels/{workspaceId}/{channelId}/diagnostics to perform the same OpenFGA tuple read shape used by the Slack bot, compare tuple-backed agents with slack_channel_agent_routes, flag stale Mongo metadata that runtime ignores, flag listen-mode mismatches such as mention-only routes that will ignore plain messages, and show the latest slack_bot runtime error from audit_events.

Slack route misses fail closed without turning ambient channel chatter into bot noise. For plain channel messages, the bot still records OpenFGA read failures and listen-mode mismatches for Slack Runtime Diagnostics, but it does not post a route-miss notice unless the user explicitly invoked the bot. During initial setup, SLACK_INTEGRATION_SILENCE_ENV=true stops Slack handlers before they can send user-visible responses at all. Diagnostics remains the operator-facing path for common errors: stale metadata without an OpenFGA tuple can be removed, and mention-only/message-only routes can be updated to listen: all.

Keycloak Role β†’ ReBAC Transition Check​

The transition comparison API is intentionally read-only and engineer-facing:

  1. Engineers call /api/rbac/enforcement-comparison with a subject/action/resource plus observed identity/group context.
  2. The API checks the same relationship in OpenFGA; legacy realm-role classification is historical-only.
  3. If the resource type is rebac_enforced, matching per-resource roles are reported as ignored and the effective decision comes only from ReBAC.

Admin UI​

Admins configure channel/team ownership in Admin β†’ Teams β†’ selected team β†’ Slack Channels and channel/agent grants in Admin β†’ Integrations β†’ Slack.

  • Channel/team ownership is exclusive: a channel cannot be actively mapped to two teams.
  • Channel/agent associations are many-to-many OpenFGA tuples: a channel can have multiple Dynamic Agent associations.
  • Removing an association deletes the OpenFGA tuple and its saved Mongo listen/priority metadata, denying that resource in the channel even if the user has access elsewhere.
  • UI-managed route dispatch is the default with static YAML fallback (SLACK_AGENT_ROUTES_MODE=db_prefer). Set config only for static YAML routing, and use db_only only after the channel's OpenFGA-backed UI routes are complete.
  • Runtime auto-assignment is opt-in with SLACK_AUTO_ASSIGN_UNMAPPED_CHANNELS=true, SLACK_DEFAULT_TEAM_SLUG, and SLACK_DEFAULT_AGENT_ID. It only handles channels with no active mapping and never changes an already assigned channel.
  • Runtime sync/reload uses the Web UI backend as the browser-facing boundary. caipe-ui authorizes the admin user, calls the Slack bot admin API with a Keycloak client-credentials token, and the Slack bot verifies that token with JWKS before exposing route status, cache reload, or static-config upsert sync.
  • Deep links that include subtab=slack or openfgaTab=slack canonicalize to Admin β†’ Integrations β†’ Slack, even if an older link still carries cat=system&tab=settings.

MongoDB Collection: channel_team_mappings​

{
"_id": ObjectId,
"slack_channel_id": "C0123456789",
"team_id": "6612...",
"channel_name": "#k8s-support",
"slack_workspace_id": "CAIPE",
"created_by": "admin@example.com",
"created_at": ISODate,
"active": true
}

OpenFGA Tuple: Slack Channel Agent Association​

slack_channel:CAIPE--C0123456789 user agent:my-k8s-agent

The channel-agent association lives in OpenFGA. The agent:<id> value is the Dynamic Agent slug (string _id in the dynamic_agents collection). The legacy slack_channel_grants collection may exist during migration, but it is not an allow source for Slack runtime channel-agent decisions.

MongoDB Collection: slack_channel_agent_routes​

{
"workspace_id": "CAIPE",
"channel_id": "C0123456789",
"agent_id": "my-k8s-agent",
"enabled": true,
"priority": 100,
"users": { "enabled": true, "listen": "mention" },
"source_type": "manual",
"status": "active",
"created_by": "admin@example.com",
"created_at": "2026-05-12T00:00:00.000Z"
}

This row is metadata for a matching OpenFGA tuple. It does not authorize dispatch by itself, and it is deleted when the channel-agent association is deleted.


Web UI Object-Level Checks​

For UI-owned resource surfaces, the BFF performs the coarse session or legacy scope gate first and then checks the concrete OpenFGA object before returning or proxying data.

Current strict surfaces include conversation:<id> for chat list/read/write/share/stream and message persistence, skill:<id> for catalog/config/hub file and scan access, admin_surface:rag_datasources for RAG Data Sources tab administration, knowledge_base:<id> for RAG proxy paths, datasource list filtering, search filter injection, and direct RAG API/MCP checks, agent:<id> for Dynamic Agent listing and mutation, mcp_server:agentgateway for AgentGateway discovery/sync, mcp_server:<id>#can_discover for the Create Agent β†’ Tools Probe button (probing only enumerates advertised tool metadata, so it is gated on can_discover rather than can_invoke; the model already grants discover to organization members, organization admins, team-shared members, owners, and channel/group routings), and system_config:platform_settings for platform configuration. Conversation checks use implicit owner access first and explicit OpenFGA relationships for non-owner access. RAG proxy calls still forward the Keycloak bearer token after the BFF PDP decision, so RAG validates issuer, audience, signature, and expiry with Keycloak before checking OpenFGA using team-derived knowledge_base relationships. The Dynamic Agent built-in tool catalog at GET /api/dynamic-agents/builtin-tools is intentionally not strict-gated β€” the catalog is a static metadata listing of supported built-in tool types (web_search, file_io, etc.), is needed by every authenticated user who can open the Create Agent wizard, and per-tool authorization happens at MCP invocation time. The route requires an authenticated session and forwards the bearer token to dynamic-agents (where DA_REQUIRE_BEARER still applies); it does not consult OpenFGA. Task Builder routes are intentionally excluded from this pass because they are scheduled for refactor.


Compact End-to-End Request Flow (Reference)​

A condensed text-only version of the per-request sequence above. Useful for runbooks and incident-response playbooks where a Mermaid diagram is overkill.

Slack User: "What's the status of my ArgoCD deployment?"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: Identity Resolution (Slack Bot)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
slack_user_id U09TC6RR8KX
β†’ Keycloak Admin API lookup by attribute
β†’ user: { id: "a3f9...", email: "alice@example.com" }
RFC 8693 exchange β†’ OBO JWT
sub=alice, act.sub=slack-bot

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 2: Supervisor Ingestion (A2A + LangGraph)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
POST /a2a Authorization: Bearer OBO_JWT
β†’ OAuth2Middleware: validates RS256 signature against JWKS
β†’ JwtUserContextMiddleware: decodes claims, stores in ContextVar
β†’ agent_executor: get_jwt_user_context() β†’ email=alice
β†’ LangGraph selects ArgoCD MCP tool

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 3: Policy Enforcement (AgentGateway)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
POST /argocd/... Authorization: Bearer OBO_JWT
β†’ ext_authz: OpenFGA check for caller/team/tool relationship β†’ ALLOW
β†’ Proxy to ArgoCD MCP Server

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 4: MCP Tool Execution (ArgoCD MCP Server)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Validates OBO JWT against Keycloak JWKS independently
Extracts email=alice, tenant=acme
Returns deployments scoped to alice's tenant

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Response path: MCP β†’ Gateway β†’ Supervisor β†’ Slack β†’ User