RBAC Workflows
Sequence diagrams and flow narratives for "what happens when X". Pair this with Architecture (which describes each component) β this doc is about how those components interact over time.
If you only have 5 minutes, read Per-request authorization β it's the most important diagram in CAIPE.
Login + First-Time Broker Loginβ
This is the once-per-session flow. After it completes, the user holds a Keycloak-backed UI session and usually never sees Keycloak (or the upstream IdP β Okta / Duo SSO / etc.) again until the Keycloak SSO session can no longer be refreshed.
The default Keycloak "first broker login" flow shows a "Review Profile" page and, if a local account with the same email already exists, a "Confirm Link Account" page. Both are eliminated by the custom flow patched in by init-idp.sh:
caipe-silent-broker-login (both executions: ALTERNATIVE)
β
βββ idp-create-user-if-unique
β Condition: no local user with this email exists
β Action: provision new Keycloak user, assign default roles
β
βββ idp-auto-link
Condition: local user with matching email already exists
Action: link external identity to existing account silently
This only works correctly because trustEmail=true is set on the IdP. That flag tells Keycloak to treat the email claim from the upstream IdP (Okta, Duo SSO, Azure AD, β¦) as authoritative for account matching.
Production installs should also keep keycloak.idp.forceRedirect=true (exported to KEYCLOAK_FORCE_IDP_REDIRECT=true). That makes the app realm's browser flow require the IdP redirector and disables the local Keycloak username/password form, so users go to the enterprise IdP even if a client does not send kc_idp_hint.
The other side of the kc_idp_hint contract is on the UI: ui/src/lib/auth-config.ts only spreads kc_idp_hint=${OIDC_IDP_HINT} into the NextAuth authorization params when OIDC_IDP_HINT is set and non-empty. Two tests pin this contract end-to-end:
ui/src/lib/__tests__/auth-config.test.ts(describeOIDC kc_idp_hint forwarding) asserts the hint is forwarded verbatim when configured and is omitted when the env var is unset or empty.tests/integration/test_keycloak_idp_hint_redirect.shboots a throwaway Keycloak, runsinit-idp.sh, and assertsGET /realms/caipe/protocol/openid-connect/auth302/303s to/broker/${IDP_ALIAS}/loginwith no hint, with a valid hint, and degrades gracefully (no 5xx) with an unknown hint. Both are gated in CI via theidp-hint-testjob in.github/workflows/ci-keycloak-init.yml(also runnable locally viamake test-keycloak-idp-hintormake test-keycloak-sso-all). See Secrets bootstrap β SSO bootstrap βkc_idp_hintand the IdP redirector for the long-form walkthrough.
Security implication: if the upstream IdP can be compromised to issue arbitrary email claims, an attacker could link to any existing account. This is acceptable here because Okta and Duo SSO (and other supported IdPs) are corporate SSO providers β trust in the email claim is the same as trust in the IdP.
The complete one-time login sequence (Browser β Keycloak β upstream IdP β Keycloak β CAIPE UI) is shown inline in Per-request authorization below β look for the "One-time login path" rectangle. With the default realm policy, active users refresh silently through Keycloak for up to the configured SSO idle and max lifetimes.
Per-Request Authorization (End to End)β
This is the RBAC sequence diagram. It traces a single Slack message ("list my ArgoCD apps") all the way through OBO token exchange, supervisor middleware, AgentGateway extAuthz / OpenFGA evaluation, and into the MCP server. JWKS refresh and one-time login timelines run alongside the hot path and the diagram shows how they converge.
Read this diagram as four independent timelines that happen to converge:
- Policy timeline β admins change ReBAC relationships through the OpenFGA/ReBAC UI and team resource APIs. Those writes update MongoDB provenance and OpenFGA tuples; AgentGateway does not maintain a CEL policy CRUD surface or Mongo-backed config bridge.
- Key timeline β Keycloak publishes its signing keys on a public endpoint. AG fetches them lazily (startup, TTL expiry, or unknown
kid). Keycloak is not a runtime dependency of AG β requests succeed even if Keycloak is briefly unreachable, as long as the cached JWKS has a valid key for the JWT'skid. - Login timeline β Duo SSO authenticates the human at the start of the Keycloak SSO session. Keycloak exchanges that Duo assertion for CAIPE tokens; the UI keeps refreshing 1-hour access tokens through Keycloak while the 8-hour idle / 24-hour max SSO session remains valid. CAIPE UI keeps large OAuth tokens in a server-side token cache and only stores slim session metadata in the httpOnly cookie. If that cache is lost after a UI restart, RAG and token-enforced data-plane calls redirect through login, while Dynamic Agents browser proxy routes can still forward the signed-in
X-User-Contextfallback for configuration and save flows; AgentGateway-backed MCP probes/tools may still require a fresh Keycloak bearer. On every successful CAIPE login, the BFF reconciles OpenFGA tuples for users who passedOIDC_REQUIRED_GROUP; those tuples come from the admin-managed default OpenFGA grant profile bundle when present, otherwise the built-in Org Member / Org Admin defaults. Users inOIDC_REQUIRED_ADMIN_GROUPorBOOTSTRAP_ADMIN_EMAILSreceive the selected admin profile grants. Team-assigned custom profiles override the global member/admin profile for matching team users and are materialized as direct user tuples; multiple team overrides union with each other. Admin changes in Security & Policy β OpenFGA β Default FGA Grants can save future-login templates and reconcile all known users immediately. If enabled, CAIPE also uses the login-timememberOf/groupsclaims to reconcile only the signed-in user's managed team memberships. This claim path is additive; full inventory, removals, and drift still come from direct Okta/AD API sync. Duo is not on the request hot path β it is only touched on login or when Keycloak/upstream IdP policy requires interactive reauthentication. AgentGateway only needs to understand the Keycloak-signed JWT and the OpenFGA decision. - Request timeline β the OBO JWT carries the user's identity and roles end-to-end. The same token is verified by AG (edge) and optionally re-verified by the MCP server (depth). This is deliberate: a compromised AG doesn't let tokens past MCP without signature check.
Demo tip: when presenting this diagram live, start by highlighting the Login timeline and note "this happens once, then Keycloak refreshes the CAIPE access token while the SSO session is active". Then trace through the Request timeline and ask the audience where Duo appears β the answer is nowhere, because every downstream check uses the Keycloak-signed JWT. This is the clearest way to explain why CAIPE can swap IdPs without touching agent code.
Dynamic Agent Invocationβ
Dynamic Agent start, invoke, resume, and cancel requests have two authorization
layers. The Web UI backend blocks denied callers before any backend proxy call by
checking agent use plus conversation write access. Conversation write uses the
hybrid model: implicit MongoDB ownership (owner_subject or legacy owner_id)
is accepted for private conversations, while shared or delegated writes fall
through to explicit OpenFGA conversation:<id> relationships. The Dynamic
Agents runtime repeats the agent-use check before agent lookup or runtime work.
The same sequence applies to POST /api/v1/chat/invoke,
POST /api/v1/chat/stream/resume, and POST /api/v1/chat/stream/cancel (cancel
does not start runtime work, but it still requires agent use and conversation write authorization). The RBAC Audit tab
surfaces Web UI backend and Dynamic Agents OpenFGA decisions as OpenFGA ReBAC rows with
pdp=openfga and the checked tuple in resource_ref. MongoDB audit_events
is authoritative for compliance and history; Jaeger/OTel can still be enabled
for request-flow debugging, but the Admin UI does not need it to show authz
decisions.
Slack follow-up bookkeeping uses PATCH /api/chat/conversations/[id]/metadata
after a response is posted. That endpoint uses the same implicit-owner-or-explicit
conversation write check, so a Slack OBO token for the conversation owner can
update thread metadata such as last_processed_ts without a separate
conversation:<id>#writer tuple.
Self-Service Resource Creationβ
Private and team-scoped Dynamic Agents, MCP servers, and RAG data sources use the same OpenFGA-backed create flow. MongoDB persists the resource document, while OpenFGA is the PDP for who can see, use, or manage it.
For private resources, the creator's direct owner tuple derives management
rights. For team resources, team members get use/read access and team admins get
the manager relationship. Team membership and team-admin status are evaluated by
OpenFGA checks; Mongo team fields are metadata and compatibility context, not the
primary authorization decision.
Credential OAuth Connector Flowβ
The Connections & Secrets OAuth connector flow is a CAIPE credential-exchange
flow, not a Keycloak login broker flow. Provider client IDs/secrets are seeded
from .env in Docker Compose or ESO in Kubernetes into encrypted MongoDB
connector records. Users then create or relink per-provider connections from the
Connections page. The browser navigates in the same tab so the OAuth callback
keeps the signed-in CAIPE session context:
The /credentials page is feature-flagged by CAIPE_CREDENTIALS_ENABLED and
then gated by OpenFGA organization membership (can_use organization:<org_key>). The Admin β Settings β Credentials tab is separately
feature-flagged and visible only for organization admins (can_manage organization:<org_key>), even if a non-admin has read-only baseline admin
surface grants.
The browser never receives provider tokens or decrypted secret material. Local
development may use http://localhost redirect URIs, but production connector
redirect URIs must use HTTPS. The final callback page includes a return link and
still broadcasts a connection event for tabs that are listening. Built-in
GitHub and Webex connector bootstrap normalizes legacy local
http://localhost:3001/oauth/{provider}/callback values to the CAIPE UI callback
route at /api/credentials/oauth/{provider}/callback, so the provider returns
to the BFF route that stores the encrypted token set.
After a connection exists, the Connections page can run Check GitHub Profile,
Check Atlassian Profile, Check Webex Profile, or Check PagerDuty
Profile. The browser calls the BFF profile-check route for its own connection id; the BFF verifies the session,
loads only connections owned by the signed-in Keycloak sub, refreshes the
provider token server-side, calls the provider profile endpoint, and returns a
small redacted profile summary. Atlassian checks also fall back to
/oauth/token/accessible-resources when the User Identity /me endpoint returns
403, so operators can distinguish a valid OAuth grant from a denied profile API.
The route also returns a redacted diagnostics checklist for the Connections page
modal: connection ownership, refresh-token acceptance, provider profile status,
and Atlassian accessible-resource/scope status where applicable. Each diagnostic
includes operator guidance such as relinking the provider or asking an Atlassian
admin to review User Identity API access. The route never returns the OAuth
access or refresh token.
The Connections page also performs an automatic, browser-safe refresh pass on
load for connected providers whose access token is expired or within the refresh
threshold. The BFF POST /api/credentials/connections/{id}/refresh route verifies
the same session ownership, refreshes the provider token server-side, persists the
new encrypted access-token reference/expiry metadata, and returns only refresh
metadata (ok, provider, and expiry interval), never token material.
Runtime callers that need a provider access token use
POST /api/credentials/exchange instead. That route is non-browser only: it
rejects Origin/Referer/cookie requests, validates the service bearer JWT, checks
the expected credential-service audience header, and supports either an explicit
provider_connection_id or a provider key such as atlassian. Provider-key
exchange selects the connected provider record owned by the JWT sub, so a
Dynamic Agent invocation receives the signed-in user's Atlassian token without
hard-coding a per-user connection id. Explicit connection-id exchange still
requires either ownership by JWT sub or OpenFGA
secret_ref:provider_connection:<id>#can_use before returning a refreshed
provider access token.
For Jira MCP, Dynamic Agents keeps the user's Keycloak JWT on Authorization for
MCP authentication and injects the exchanged Atlassian OAuth token on
X-CAIPE-Provider-Token. Jira treats that header as a provider Bearer token and
does not require ATLASSIAN_EMAIL for that OAuth path; static API-token Basic
auth remains available when impersonation tokens are disabled.
GET /api/credentials/inject/atlassian is implemented as the future BFF
contract for AgentGateway-style provider-token injection, but AgentGateway v0.12
does not support backend-level HTTP extAuthz response-header injection. Until
that gateway capability exists, the active Jira path keeps the exchange in the
connector/runtime layer: Dynamic Agents resolves the user-specific Atlassian
token through credential exchange and Jira MCP consumes it from
X-CAIPE-Provider-Token.
Webex Space ReBAC and Bot Dispatchβ
Webex follows the Slack bot trust model with Webex spaces in place of channels. The bot treats Webex as an external event source, not as an identity provider: every protected message must map to a Keycloak user, a CAIPE team, an OpenFGA-backed space route, and a user/resource allow decision before dispatch.
Failure categories are explicit and fail closed: WEBEX_USER_NOT_LINKED,
WEBEX_WORKSPACE_UNCONFIGURED, WEBEX_SPACE_TEAM_NOT_FOUND,
WEBEX_OBO_FAILED, WEBEX_ROUTE_DENIED, missing_space_grant, and
pdp_unavailable. Audit records use component=webex_bot and hash Webex person
IDs before logging.
WEBEX_USER_NOT_LINKED is handled privately by default. In a group space, the
bot sends only a generic thread notice and delivers the signed SSO link in a 1:1
Webex Adaptive Card addressed to the requesting personId. If the 1:1 send
fails, the fallback message asks the user to open the app and retry linking
without exposing the signed URL in the shared room. Slack-style implicit Webex
profile linking remains an explicit user-choice path and requires strict Webex
org, verified-email, no-conflict, and audit checks before it can bind
webex_user_id without an SSO click.
For Webex spaces, the raw room UUID is the policy identifier in
webex_space:<alias>--<space>. Public Webex room IDs are decoded from
ciscospark://us/ROOM/<uuid> before MongoDB/OpenFGA lookups and re-encoded only
for outbound Webex API calls.
Team Creation OpenFGA Syncβ
When an admin creates a team through POST /api/admin/teams, the Web UI
backend synchronizes three pieces of state in one shot β Mongo teams,
Mongo team_membership_sources, and OpenFGA (membership tuples). The
OpenFGA write is what makes team:<slug>#can_use resolve true for the
creator on subsequent requests like Dynamic Agent creation. Skipping
the OpenFGA step leaves team:<slug>#can_use false even though Mongo has
the membership row, and OWNER_TEAM_FORBIDDEN fires on the very next
agent-creation API call.
Phase 3 of spec 2026-05-24-derive-team-from-channel removed the per-team Keycloak client scope (
team-<slug>). Team creation no longer touches Keycloak; teams are a pure Mongo+OpenFGA concept.
Why the creator gets both admin and member tuples even though admin
alone would satisfy can_use (model: can_use = member βͺ admin):
- The team Members tab in the Admin UI reads the Mongo
members[]array verbatim. If the creator is only stored asrole: 'owner', the tab continues to show them as the only member, which matches the visible Mongo intent. - The redundant
membertuple keeps the OpenFGA store self-describing β a future read ofteam:<slug>#memberreturns every human-or-admin member, not just the team admins. - It costs one extra tuple per creator and removes a class of "I'm an
admin, why don't list endpoints that filter by
team#memberinclude me?" bugs.
The same helpers (resolveKeycloakUserSubject and writeTeamMembershipTuples
in ui/src/lib/rbac/team-membership-sync.ts) are used by
POST /api/admin/teams/[id]/members so the add-member path is symmetric:
adding a member writes a member tuple, adding an admin writes an
admin tuple, and removing the last source for a relation deletes the
corresponding tuple.
Team OpenFGA Sync Diagnosticβ
Even with the team-creation sync wired up correctly, a team can drift out
of step with OpenFGA over time: someone wrote a Mongo source row before
the user had logged in to Keycloak (so we couldn't yet resolve their
sub), the OpenFGA store was rebuilt without replaying tuples, or a
legacy team was created before the sync helpers existed. The Teams
settings dialog now surfaces drift directly to the admin instead of
hiding it behind backend logs.
The diagnostic reports four states per source row:
| State | Meaning | Admin action |
|---|---|---|
synced | Source row has user_subject AND OpenFGA contains the matching user:<sub>#<relation> team:<slug> tuple | None |
pending | Source row exists but user_subject is empty (e.g. the user has never signed in to Keycloak yet) | Wait for first sign-in, or click Reconcile once Keycloak knows the user |
drifted | user_subject is resolved but OpenFGA is missing the matching tuple | Click Reconcile |
unknown | OpenFGA read failed or the store is unconfigured | Check OpenFGA health in Security & Policy |
needs_attention on the summary is true if any row is drifted or
unknown. pending does not flip the banner red β it's an
informational state, not a failure mode.
The Reconcile endpoint is intentionally idempotent: write-on-already-present
is a no-op at the OpenFGA layer, so it is safe to invoke repeatedly. It
returns unresolved_emails for any source rows whose subject could not
be re-resolved (e.g. an invitee's Keycloak account still does not exist),
so the admin can chase those manually rather than spinning on the button.
Dynamic Agent Creation Ownershipβ
New Dynamic Agents must be assigned to an owner team during creation. The Web UI
backend validates the selected team before writing any agent document: platform
admins can choose any team, while scoped team admins can choose only teams where
they are admin or owner.
AgentGateway MCP Endpoint Routingβ
Invariant. Every MCP server routed through AgentGateway must persist
an endpoint of the form {agentgateway_base}/mcp/<server_id>.
AgentGateway dispatches by path prefix (/mcp/<target>); a bare
{agentgateway_base}/mcp falls through to a non-registered route and
returns HTTP 404 Not Found on every probe and tool call. The class
first surfaced in production as the Confluence card showing:
Failed to connect to MCP server: HTTP 404 Not Found from
http://agentgateway:4000/mcp
Defence in depth. The invariant is enforced in four places β any one of them is sufficient on its own, but all four together mean a bad endpoint cannot persist for long:
| Layer | What it does | Code |
|---|---|---|
| Save-side normaliser (BFF) | POST/PUT /api/mcp-servers rewrites a bare gateway URL to /<server_id> form before insert/update β prevents future drift | ui/src/lib/rbac/mcp-endpoint-normalizer.ts, ui/src/app/api/mcp-servers/route.ts |
| Editor picker (UI) | MCPServerEditor calls /api/mcp-servers/agentgateway/discover on open and offers a Pick AgentGateway target row that fills the endpoint with the canonical /<id> form | ui/src/components/dynamic-agents/MCPServerEditor.tsx |
| Read-side self-heal (runtime) | build_mcp_connection_config in dynamic-agents re-normalises against AGENT_GATEWAY_URL before handing the URL to the MCP transport, so legacy rows still work until repaired | ai_platform_engineering/dynamic_agents/src/dynamic_agents/services/mcp_client.py, services/mcp_endpoint_normalizer.py |
| One-shot repair script | scripts/fix-mcp-endpoint-routing.ts audits the mcp_servers collection (dry-run by default) and rewrites mis-shaped rows under --apply | scripts/fix-mcp-endpoint-routing.ts |
Direct upstream URLs are never rewritten. AgentGateway routing is
opt-in per server, and silently rewriting http://mcp-confluence:8000/mcp
would break stdio and in-cluster topologies. The normaliser detects
gateway endpoints by origin match against the configured
AGENT_GATEWAY_URL; anything else passes through unchanged.
Config-driven rows are never rewritten by the repair script. Their
source of truth is config.yaml. If a config-driven row is mis-shaped,
the script logs it under untouchedConfigDriven so an operator can fix
the YAML instead.
Operator workflow for the repair script:
# Dry-run β prints which rows would change, no Mongo writes.
MONGODB_URI=mongodb://... \
AGENT_GATEWAY_URL=http://agentgateway:4000 \
npx ts-node scripts/fix-mcp-endpoint-routing.ts
# Apply the repairs (idempotent β re-running is a no-op).
MONGODB_URI=mongodb://... \
AGENT_GATEWAY_URL=http://agentgateway:4000 \
npx ts-node scripts/fix-mcp-endpoint-routing.ts --apply
The dry-run output includes a reason for each candidate
(bare_gateway_base, gateway_root_only, wrong_target_suffix)
so the admin can sanity-check the proposed change before committing.
Testing the repair script: the pure helpers (normalizeMcpEndpointForServer,
buildRepairPlan) are covered by scripts/__tests__/fix-mcp-endpoint-routing.test.ts
and run with the same invocation as the other scripts/__tests__/*.test.ts
files:
npx ts-node --compiler-options '{"module":"CommonJS"}' \
scripts/__tests__/fix-mcp-endpoint-routing.test.ts
The Mongo IO half of the script (main() / MongoClient.connect()) is left
to live verification because it needs a real database; the pure helpers cover
every classification path (bare_gateway_base, gateway_root_only,
wrong_target_suffix), the safety rules (direct upstream / config-driven /
no _id / no endpoint), and the customisable AgentGateway base URL.
0.5.1 Schema Migration Tabβ
Admins run release migrations from Admin β System β Migrations. The tab loads the 0.5.1 migration manifest, lets the admin select and dry-run each migration, and requires typing the exact confirmation string before applying writes.
init-idp.sh remains the first-run bootstrap escape hatch because it runs before
the Web UI backend is healthy and can use direct Keycloak admin credentials. It
prevents a chicken-and-egg dependency where BFF startup needs Keycloak client/realm
state that only BFF startup could create.
After that bootstrap, the Web UI backend owns the long-term Keycloak reconciliation
path through keycloak_rbac_mapping_reconciliation_v1. This migration is
code-backed in TypeScript rather than shell-backed by init-idp.sh; on BFF startup
it reconciles bot OBO permissions (token-exchange decision strategy, service-account
impersonation roles, realm-level users.impersonate scope-permission), resolves
BOOTSTRAP_ADMIN_EMAILS to Keycloak user ids, creates passwordless verified
placeholders for bootstrap emails that have not logged in, writes durable OpenFGA
admin tuples, records the run in Mongo migration tables, and leaves a blocking
migration status if the Keycloak repair fails. (Phase 3 of spec
2026-05-24-derive-team-from-channel removed the per-team and personal client-scope
branches from this migration; teams no longer touch Keycloak.) The header checks
GET /api/rbac/migration-status for every authenticated UI session so non-admin
users see the same "migrations required" indicator. Admins can inspect persisted
Keycloak run details, counts, warnings, and errors from GET /api/admin/keycloak/migration-health in Admin β Security & Policy β Keycloak.
The panel surfaces five high-signal tiles at the top (Schema area / Version /
Migration status / Last actor / Bootstrap admins) and the Keycloak Invariants
section below them with per-row Fix buttons as the actionable source of truth
for OBO token-exchange permission strategy, attached OBO policies, and
service-account impersonation roles. Bootstrap-admin
diagnostics (configured emails, resolved Keycloak subjects, placeholder
creations, tuple writes, per-email warnings) are still inspectable through the
Bootstrap admins tile at the top of the panel.
If the stored run is failed or the keycloak_rbac_mappings schema area is behind,
the Reconcile now button posts to the existing migration apply route for
keycloak_rbac_mapping_reconciliation_v1 and then reloads the health panel from
Mongo.
Dynamic Agent migrations include both tool tuple reconciliation and
agent_org_admin_inheritance_v1, which backfills
organization:<org>#admin manager agent:<id> for existing agents so
organization admins inherit can_manage without assigning owner teams to legacy
records.
Conversation authorization after the migration remains hybrid: if the caller owns
the conversation by owner_subject or legacy owner_id, the Web UI backend allows
the private owner path without a per-conversation OpenFGA owner tuple. Non-owners
must still pass explicit OpenFGA checks for shared conversation access.
OBO Token Exchange β Bot Identity Propagationβ
Badge analogy: The Slack or Webex bot is a courier service. When Alice asks the courier to pick something up from the server room on her behalf, the courier can't use their own badge β the server room requires Alice's clearance. Instead, the courier goes to HR (Keycloak), presents their credentials and Alice's employee ID, and HR issues a delegated badge: it opens the same doors as Alice's badge, but it has a second chip that says "issued on behalf of Alice, presented by courier bot." The delegation chain is physically stamped on the badge β it's auditable and unforgeable.
The hardest part to get right technically. Without OBO, every Slack or Webex request carries the bot's service account identity. OpenFGA would evaluate the bot instead of the human, and all per-user/team authorization would be meaningless.
RFC 8693 Token Exchangeβ
OBO (On-Behalf-Of) is implemented via RFC 8693 token exchange. The bot uses its client_credentials grant to request a token impersonating a specific Keycloak user:
POST /realms/caipe/protocol/openid-connect/token
Content-Type: application/x-www-form-urlencoded
grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&client_id=slack-bot
&client_secret=<bot-secret>
&subject_token=<bot-access-token>
&subject_token_type=urn:ietf:params:oauth:token-type:access_token
&requested_subject=<keycloak-user-id>
&requested_token_type=urn:ietf:params:oauth:token-type:access_token
&audience=${CAIPE_PLATFORM_AUDIENCE:-caipe-platform}
&scope=openid team-<slug-or-personal>
Keycloak responds with an OBO JWT where:
sub= the impersonated user's Keycloak IDemail= the user's emailact.sub= the bot's client ID β the delegation chain is cryptographically recordedaudincludescaipe-platformby default because the bot's immediate next hop is the CAIPE UI BFF access-check/proxy surface, not AgentGateway
Phase 3 of spec 2026-05-24-derive-team-from-channel removed the legacy
active_teamJWT claim. The bot no longer requests ateam-*scope and the OBO token no longer carries a team identifier. Team identity for a Slack channel or Webex space is now derived at request time fromchannel_team_mappings/webex_space_team_mappings(see "Channel-message dispatch" below). DM dispatch follows the personal chain (override β preference βdm_agent_idβdefault_agent_idβ deny).
Bot β BFF Audienceβ
Slack and Webex use the same audience model. The bot mints a team-agnostic
user OBO token for the next hop it is calling: the CAIPE UI BFF. That
is why CAIPE_PLATFORM_AUDIENCE defaults to caipe-platform. AgentGateway
still accepts agentgateway for direct data-plane callers and legacy paths,
but bot pre-dispatch checks should not mint aud=agentgateway.
The two load-bearing invariants are:
- Audience follows the next hop. Bot pre-dispatch calls target the CAIPE UI BFF, so OBO uses
CAIPE_PLATFORM_AUDIENCE(caipe-platformby default). The same bearer can still be forwarded later because Dynamic Agents and AgentGateway acceptcaipe-platform. - Team context is data-layer derived, not JWT-signed (Phase 3 of spec 2026-05-24-derive-team-from-channel). Channel/space β team mapping is read from MongoDB at every request, and the BFF + AgentGateway PDP evaluate the OpenFGA decision against that mapping. The OBO token is team-agnostic.
Sharing model: assigning a channel to a team transitively shares its agentsβ
Channel-dispatch authorization deliberately uses the channel's mapped team as
the user-side subject of the can_use agent:<id> check
(team:<slug>#member can_use agent:<id>). This is stronger than a direct
per-user grant on the user β agent edge, because the OpenFGA model lets a
user reach the agent through any team they belong to that has the grant.
Operationally that means:
- Assigning channel
Ctoteam:Tand then sharing any agentAwithC(via the channel'scan_use agenttuple) also makesAcallable inCby every member ofteam:T, including members who were never grantedAdirectly. - Removing the channelβteam assignment, or unsharing the agent from the channel, revokes that transitive access immediately on the next request.
- A DM with the same user does not inherit this channelβteam cascade
on its own β DM dispatch uses
user:<sub> can_use agent:<id>and ignores channel/team mappings. However, the DM check does fall back to a team-union OpenFGA evaluation against existingteam:<slug>#member can_use agent:<id>tuples (seeevaluateAgentAccess), so any agent explicitly shared with a team via the Agent editor (next section) is callable in DM by every member of that team.
If an agent must stay private to a subset of a team, do not pin it to a channel that is mapped to that team. Either:
- Share the agent with a smaller team (or with individual users) and keep the channel mapped to the broader team for other agents, or
- Map the channel to a narrower team whose membership matches the intended audience for that agent.
The admin UI (Slack channel and Webex space ReBAC panels) surfaces this trade-off in the top-of-card "Sharing model" callout and in a per-channel heads-up under the agent-association form. Future work may add an optional per-channel agent allow-list that is stricter than the team-level grant; until then, the team cascade is the canonical policy and is documented behavior, not a bug.
Sharing model: explicit "Share with Teams" on an agentβ
The Agent editor (DynamicAgentEditor) has a "Share with Teams"
multi-select that operates on the same two-tuple inheritance pair as
the owner team, but additively β selecting a team T writes
team:T#member can_use agent:<id> and team:T#admin can_manage agent:<id> to OpenFGA without disturbing the owner-team tuples. The
practical consequence is:
- Every member of team T can DM the agent in a 1:1 chat (because the
DM dispatch's team-union fallback resolves
user:<sub>βteam:T#memberβcan_use agent:<id>). - Every member of team T can use the agent in any Slack channel or
Webex space whose
channel_team_mappings/webex_space_team_mappingsrow points at team T (because channel dispatch evaluatesteam:T#member can_use agent:<id>directly). - Every admin of team T inherits
can_manageon the agent and can edit, disable, or delete it from the admin surfaces.
Removing a team from the multi-select on the editor is symmetric:
POST/PUT /api/dynamic-agents walks the previous shared_with_teams
list against the new one and emits OpenFGA delete tuples for every
removed slug (via previousSharedTeamSlugs on
reconcileAgentRelationships). Until 2026-05-27 this field was
Mongo-only β the multi-select silently denied access β see
agent_shared_team_grants_backfill_v1 for the one-shot replay that
fixes existing agents.
The "Effective access" callout under the multi-select is the
operator-facing render of exactly which team:<slug>#member tuples the
next save will write to OpenFGA, so admins can confirm the transitive
grant before the form is submitted.
/use default workflow (DM personal default)β
/caipe-use default <agent_id> and /caipe-use default (no agent) update a
single per-user preference (dm_default_agent_id) in one round-trip. The
bot resolves the agent (or null), checks the user can can_use it (when
setting), then writes the preference. The next DM dispatches via the
personal chain and lands on the new default.
Both the override (live DM dispatch) and the preference (next-DM default) are cleared in a single round-trip when the user passes the bare form, matching FR-029a in spec 2026-05-24-derive-team-from-channel.
Security Properties of OBOβ
| Property | Mechanism |
|---|---|
| Bot cannot forge a user identity | Keycloak only issues the OBO token if the bot's client_id has the token-exchange permission granted in the realm |
| Delegation is auditable | act.sub in the JWT records the bot as delegating party β verifiable in any JWKS-aware system |
| User/team relationships are enforced, not bot identity | OpenFGA checks use the impersonated user's sub and team relationships from the OBO token context |
| Token expiry still applies | OBO tokens have the same exp as a normal Keycloak token; expired tokens are rejected at every JWKS validation point |
| Unlinked users are blocked at the edge | rbac_global_middleware in the Slack bot rejects unlinked users before they reach the supervisor β the linking prompt is sent at most once per SLACK_LINKING_PROMPT_COOLDOWN seconds (default: 3600) |
Slack Identity Linking (Auto-Bootstrap + JIT + Forced Link)β
There are three onboarding paths, in priority order: (1) auto-link to existing Keycloak user, (2) JIT-create a new shell user (spec 103), (3) HMAC-signed link URL as fallback.
1. Auto-bootstrap (default, SLACK_FORCE_LINK=false)β
On the user's first Slack message the bot:
- Calls Slack
users.infoβ fetchesprofile.email - Queries Keycloak Admin API for a user with that exact email
- If found: writes
slack_user_idattribute β linked silently, zero user action required - If not found: the bot continues to step 2 (JIT) below.
2. Just-In-Time user creation (default ON, SLACK_JIT_CREATE_USER=true)β
When no existing Keycloak user matches the Slack email, and JIT is enabled, the bot:
- Optionally checks the email domain against
SLACK_JIT_ALLOWED_EMAIL_DOMAINS(comma-separated allowlist; empty = any domain). - POSTs to
/admin/realms/{realm}/usersusing the sameKEYCLOAK_SLACK_BOT_ADMIN_*credentials (caipe-platformservice account, holdsrealm-management:{view-users, query-users, manage-users}). - The created user is federated-only: no password, no required actions,
emailVerified=true, with attributesslack_user_id,created_by=slack-bot:jit,created_at=<RFC3339>. - Race-safe: an HTTP 409 from a concurrent create is resolved by re-querying the email and returning the surviving UUID.
- On failure (4xx/5xx/network), the bot logs
event=jit_failed error_kind=<auth_failure|forbidden|server_error|network_error|unexpected>and falls through to step 3.
JIT is default ON in dev so first-time DMs work without an admin handshake. Set SLACK_JIT_CREATE_USER=false in production if you want web-UI onboarding to be a hard prerequisite β in which case all unknown emails go to the link URL below.
Single-credential design (spec 103, plan R-8). JIT deliberately reuses the existing
caipe-platformadmin client rather than introducing a separatecaipe-slack-bot-provisioner. This trades strict privilege separation (one secret can both read and create users) for operational simplicity (one Secret to manage, one rotation procedure, one audit identity). Compensating mitigations: only thecreate_user_from_slackhelper writes/users;init-idp.shandrealm-config.jsonpin the service account to exactly{view-users, query-users, manage-users}; all JIT actions are logged with stableevent=jit_*tokens for SIEM.
3. Explicit link URL (fallback or SLACK_FORCE_LINK=true)β
Whenever auto-link returns no user and JIT is disabled / domain not allow-listed / JIT failed, the bot DMs an HMAC-signed URL:
/api/auth/slack-link?slack_user_id=U09TC6RR8KX&ts=1713196400&sig=<HMAC-SHA256>
The HMAC signature uses SLACK_LINK_HMAC_SECRET, prevents forged links, and is time-bound (TTL enforced server-side). After OIDC login, the server writes slack_user_id to the Keycloak user via the Admin API.
The user always gets an actionable path forward β the previous "contact your admin" dead-end was removed in spec 103 (FR-007).
In all three modes, once the link is established, all future Slack messages carry the user's Keycloak identity automatically β no repeated login.
Privacy in logsβ
All log lines that reference a Slack profile email run it through mask_email() (spec 103 FR-010): alice@corp.com β ali***@corp.com. The domain stays visible for SIEM tenant attribution; the local part is redacted.
Slack Channel β Team + Agent ReBACβ
Badge analogy: Each Slack channel is a dedicated help-desk line. An admin assigns the line to a team and grants one or more Dynamic Agents to that line. When a user calls in, the operator checks both the channel grant and the user's team/agent relationship before patching them through.
How It Worksβ
Slack channel routing now separates "which team owns this channel?" from "which Dynamic Agents may be used here?" The workspace key is a configured alias (SLACK_WORKSPACE_ALIAS, for example CAIPE) rather than Slack's opaque team_id; the Slack bot maps incoming team_id values to that alias before looking up routes or grants. When a message arrives, the Slack bot reads OpenFGA tuples for slack_channel:<workspace_alias>--<channel_id> user agent:<id>, then joins optional slack_channel_agent_routes metadata for listen mode and priority. Stale Mongo route rows without a matching OpenFGA tuple are ignored. Operators can set config for static-only routing or db_only to use only UI-managed OpenFGA-backed routes. The selected agent is then verified against OpenFGA:
- Team lookup: query
channel_team_mappingsin MongoDB byslack_channel_id. - Optional first-message auto-assignment: when
SLACK_AUTO_ASSIGN_UNMAPPED_CHANNELS=trueand no active mapping exists, write the configuredSLACK_DEFAULT_TEAM_SLUGmapping, the defaultslack_channel:<workspace_alias>--<channel_id> user agent:<id>OpenFGA tuple, and matching route metadata. - Team-agnostic OBO mint: mint the user's OBO token without a
team-*scope (Phase 3 of spec 2026-05-24-derive-team-from-channel removed the per-team scope mint β channelβteam is now resolved fromchannel_team_mappingsat every request). - Channel association lookup: read OpenFGA channel-agent tuples and join Mongo route metadata only for tuple-backed agents.
- Channel ReBAC check: call the Slack channel access checker for
slack_channel:<workspace_alias>--<channel_id> can_use agent:<id>and the user's team/agent relationship (team derived from the channel mapping). - Route: dispatch to the selected
agent_idonly after both the channel association and user/team agent grant allow the request.
The Slack YAML config still registers channels and remains the fallback route source in the default db_prefer mode. Runtime channel-agent authorization lives in OpenFGA; Mongo route rows are non-authoritative metadata and are deleted when the admin deletes the channel-agent association. The OpenFGA Policy Graph overlays channel_team_mappings as read-only assigned_team routing metadata edges so operators can see channel ownership next to OpenFGA grants without treating that ownership as a mutable tuple.
The Slack Channels admin panel also includes Slack Runtime Diagnostics for the selected channel. It calls /api/admin/slack/channels/{workspaceId}/{channelId}/diagnostics to perform the same OpenFGA tuple read shape used by the Slack bot, compare tuple-backed agents with slack_channel_agent_routes, flag stale Mongo metadata that runtime ignores, flag listen-mode mismatches such as mention-only routes that will ignore plain messages, and show the latest slack_bot runtime error from audit_events.
Slack route misses fail closed without turning ambient channel chatter into bot noise. For plain channel messages, the bot still records OpenFGA read failures and listen-mode mismatches for Slack Runtime Diagnostics, but it does not post a route-miss notice unless the user explicitly invoked the bot. During initial setup, SLACK_INTEGRATION_SILENCE_ENV=true stops Slack handlers before they can send user-visible responses at all. Diagnostics remains the operator-facing path for common errors: stale metadata without an OpenFGA tuple can be removed, and mention-only/message-only routes can be updated to listen: all.
Keycloak Role β ReBAC Transition Checkβ
The transition comparison API is intentionally read-only and engineer-facing:
- Engineers call
/api/rbac/enforcement-comparisonwith a subject/action/resource plus observed identity/group context. - The API checks the same relationship in OpenFGA; legacy realm-role classification is historical-only.
- If the resource type is
rebac_enforced, matching per-resource roles are reported as ignored and the effective decision comes only from ReBAC.
Admin UIβ
Admins configure channel/team ownership in Admin β Teams β selected team β Slack Channels and channel/agent grants in Admin β Integrations β Slack.
- Channel/team ownership is exclusive: a channel cannot be actively mapped to two teams.
- Channel/agent associations are many-to-many OpenFGA tuples: a channel can have multiple Dynamic Agent associations.
- Removing an association deletes the OpenFGA tuple and its saved Mongo listen/priority metadata, denying that resource in the channel even if the user has access elsewhere.
- UI-managed route dispatch is the default with static YAML fallback (
SLACK_AGENT_ROUTES_MODE=db_prefer). Setconfigonly for static YAML routing, and usedb_onlyonly after the channel's OpenFGA-backed UI routes are complete. - Runtime auto-assignment is opt-in with
SLACK_AUTO_ASSIGN_UNMAPPED_CHANNELS=true,SLACK_DEFAULT_TEAM_SLUG, andSLACK_DEFAULT_AGENT_ID. It only handles channels with no active mapping and never changes an already assigned channel. - Runtime sync/reload uses the Web UI backend as the browser-facing boundary.
caipe-uiauthorizes the admin user, calls the Slack bot admin API with a Keycloak client-credentials token, and the Slack bot verifies that token with JWKS before exposing route status, cache reload, or static-config upsert sync. - Deep links that include
subtab=slackoropenfgaTab=slackcanonicalize to Admin β Integrations β Slack, even if an older link still carriescat=system&tab=settings.
MongoDB Collection: channel_team_mappingsβ
{
"_id": ObjectId,
"slack_channel_id": "C0123456789",
"team_id": "6612...",
"channel_name": "#k8s-support",
"slack_workspace_id": "CAIPE",
"created_by": "admin@example.com",
"created_at": ISODate,
"active": true
}
OpenFGA Tuple: Slack Channel Agent Associationβ
slack_channel:CAIPE--C0123456789 user agent:my-k8s-agent
The channel-agent association lives in OpenFGA. The agent:<id> value is the Dynamic Agent slug (string _id in the dynamic_agents collection). The legacy slack_channel_grants collection may exist during migration, but it is not an allow source for Slack runtime channel-agent decisions.
MongoDB Collection: slack_channel_agent_routesβ
{
"workspace_id": "CAIPE",
"channel_id": "C0123456789",
"agent_id": "my-k8s-agent",
"enabled": true,
"priority": 100,
"users": { "enabled": true, "listen": "mention" },
"source_type": "manual",
"status": "active",
"created_by": "admin@example.com",
"created_at": "2026-05-12T00:00:00.000Z"
}
This row is metadata for a matching OpenFGA tuple. It does not authorize dispatch by itself, and it is deleted when the channel-agent association is deleted.
Web UI Object-Level Checksβ
For UI-owned resource surfaces, the BFF performs the coarse session or legacy scope gate first and then checks the concrete OpenFGA object before returning or proxying data.
Current strict surfaces include conversation:<id> for chat list/read/write/share/stream and message persistence, skill:<id> for catalog/config/hub file and scan access, admin_surface:rag_datasources for RAG Data Sources tab administration, knowledge_base:<id> for RAG proxy paths, datasource list filtering, search filter injection, and direct RAG API/MCP checks, agent:<id> for Dynamic Agent listing and mutation, mcp_server:agentgateway for AgentGateway discovery/sync, mcp_server:<id>#can_discover for the Create Agent β Tools Probe button (probing only enumerates advertised tool metadata, so it is gated on can_discover rather than can_invoke; the model already grants discover to organization members, organization admins, team-shared members, owners, and channel/group routings), and system_config:platform_settings for platform configuration. Conversation checks use implicit owner access first and explicit OpenFGA relationships for non-owner access. RAG proxy calls still forward the Keycloak bearer token after the BFF PDP decision, so RAG validates issuer, audience, signature, and expiry with Keycloak before checking OpenFGA using team-derived knowledge_base relationships. The Dynamic Agent built-in tool catalog at GET /api/dynamic-agents/builtin-tools is intentionally not strict-gated β the catalog is a static metadata listing of supported built-in tool types (web_search, file_io, etc.), is needed by every authenticated user who can open the Create Agent wizard, and per-tool authorization happens at MCP invocation time. The route requires an authenticated session and forwards the bearer token to dynamic-agents (where DA_REQUIRE_BEARER still applies); it does not consult OpenFGA. Task Builder routes are intentionally excluded from this pass because they are scheduled for refactor.
Compact End-to-End Request Flow (Reference)β
A condensed text-only version of the per-request sequence above. Useful for runbooks and incident-response playbooks where a Mermaid diagram is overkill.
Slack User: "What's the status of my ArgoCD deployment?"
ββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 1: Identity Resolution (Slack Bot)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
slack_user_id U09TC6RR8KX
β Keycloak Admin API lookup by attribute
β user: { id: "a3f9...", email: "alice@example.com" }
RFC 8693 exchange β OBO JWT
sub=alice, act.sub=slack-bot
ββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 2: Supervisor Ingestion (A2A + LangGraph)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
POST /a2a Authorization: Bearer OBO_JWT
β OAuth2Middleware: validates RS256 signature against JWKS
β JwtUserContextMiddleware: decodes claims, stores in ContextVar
β agent_executor: get_jwt_user_context() β email=alice
β LangGraph selects ArgoCD MCP tool
ββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 3: Policy Enforcement (AgentGateway)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
POST /argocd/... Authorization: Bearer OBO_JWT
β ext_authz: OpenFGA check for caller/team/tool relationship β ALLOW
β Proxy to ArgoCD MCP Server
ββββββββββββββββββββββββββββββββββββββββββββββββββ
STEP 4: MCP Tool Execution (ArgoCD MCP Server)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Validates OBO JWT against Keycloak JWKS independently
Extracts email=alice, tenant=acme
Returns deployments scoped to alice's tenant
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Response path: MCP β Gateway β Supervisor β Slack β User