Active-team Token-scoped RBAC β Design
Status: Proposed Owner: Platform Eng Spec: 104 β Team-scoped RBAC Last updated: 2026-04-22
Why this existsβ
Today, the RBAC stack has all the right ingredients but assembles them inconsistently:
- The Slack bot mints a user-scoped token (RFC 8693 token-exchange / Keycloak impersonation) β good.
- It then drops that token somewhere between the bot, the BFF, and
dynamic-agents. Live evidence: AgentGateway logs showjwt.sub = service-account-caipe-slack-boteven though the bot logs sayOBO impersonation succeeded. - "Team scope" is half-implemented: the bot resolves a CAIPE
team_idfromchannel_team_mappings, but only the RAG path forwards it (asX-Team-Id). The chat path drops it entirely. X-Team-Idis an unsigned out-of-band header β easy to forge, easy to drop, invisible in audit logs that only capture the JWT.
This design replaces the ad-hoc header with a signed JWT claim called
active_team that rides end-to-end inside the user's access token.
What active_team isβ
active_team is a custom claim in the user's signed access token. It tells
every downstream service which one of the user's teams the current request is
acting on behalf of.
Example token (decoded)β
{
"sub": "user-uuid-of-srady",
"preferred_username": "sraradhy",
"realm_access": {
"roles": [
"chat_user",
"team_member",
"team_member:team-platform-eng",
"team_member:team-sre",
"tool_user:jira_search_issues"
]
},
"active_team": "team-platform-eng",
"aud": "agentgateway",
"iss": "http://keycloak:7080/realms/caipe",
"exp": 1776999000
}
Why a signed claim instead of X-Team-Idβ
| Property | X-Team-Id (header) | active_team (claim) |
|---|---|---|
| Tampering | Forgeable by anything in the request chain | Signed by Keycloak; AGW verifies the signature |
| Audit trail | Lost between hops; needs separate log correlation | One token reveals sub + active_team + aud together |
| Cross-service plumbing | Must be re-added at every hop (BFF, DA, AGW) β easy to drop, which is exactly today's bug | Forwarded automatically with Authorization: Bearer |
| Replay/anomaly detection | Custom logic | Same JWT lifetime + revocation rules as identity |
| AGW CEL evaluation | request.headers["x-team-id"] (untrusted) | jwt.active_team (trusted) |
| Membership safety | Bot must remember to enforce "user β team" before sending | Bot enforces at mint; AGW re-verifies via CEL |
Special valuesβ
active_team value | Meaning |
|---|---|
"<team-id>" | Request is acting on behalf of a specific CAIPE team. AGW enforces user is in that team. |
"__personal__" | Personal mode (Slack DM, web UI for users with no teams). No team-scoped tools allowed; only tool_user:*, tool_user:<name>, admin_user, chat_user rules apply. |
| (claim absent) | Treated identically to __personal__ for back-compat during rollout, but emits a warning log. Should not occur post-rollout. |
Locked design decisionsβ
These were chosen before writing any code:
| Decision | Choice | Rationale |
|---|---|---|
Where active_team is set | Per-team Keycloak client scope containing a hardcoded-claim mapper; bot requests scope=team-<id> at token-exchange | Confirmed working in spike (Experiment D, see below). Per-token (not per-user), stateless, no race conditions, no per-request user mutation |
| Membership enforcement | Both bot AND AGW check that user is in active_team (defense-in-depth) | A buggy or compromised bot can't grant access to a team the user isn't in |
| Channel without a team | Reject in Slack with "ask your admin" message | Group channels are always team-scoped |
DM (im) | Mint with active_team="__personal__" | Users can still chat 1:1 with the bot using their personal tool scopes |
| Web UI | Auto-pick the user's first/only team (or __personal__ if none) | Avoids extra UI work; team picker is a follow-up |
| Rollout | Big-bang β replace AGW CEL atomically; no feature flag | One source of truth; less risk of half-migrated drift |
X-Team-Id | Removed everywhere (chat + RAG + UI) | One mechanism, not two |
Sequence diagramsβ
Reading guide. The Keycloak boxes show what's pre-configured (left of the colon) and what fires per request (right of the colon). The team client scope is created once per team in the lifecycle diagram below; here we show how it gets used.
Diagram 0: Team lifecycle (one-time setup per team)β
This happens once when a team is created, not on every chat message.
Diagram 1: Happy path β Slack channel mapped to a teamβ
The full message-time flow. The blue box is where active_team enters
the JWT.
Diagram 2: DM / personal modeβ
The bot uses a special team-personal scope that hardcodes the marker
value. CEL recognizes the marker and skips team-membership checks.
Diagram 3: Rejection pathsβ
What happens when something is wrong. Most rejections happen in the bot, before any token gets minted β Path C is only triggered if the bot's state is stale.
File-by-file change summaryβ
| File | Change |
|---|---|
charts/ai-platform-engineering/charts/keycloak/scripts/init-token-exchange.sh | Add a "User Session Note" protocol mapper on caipe-slack-bot client: maps session note active_team β token claim active_team |
ai_platform_engineering/integrations/slack_bot/utils/obo_exchange.py | impersonate_user(user_id, *, active_team, audience="agentgateway") β sends session note via token-exchange. Delete downstream_auth_headers() X-Team-Id branch. |
ai_platform_engineering/integrations/slack_bot/app.py (_rbac_enrich_context) | Resolve channelβteam, verify membership, choose __personal__ for DMs, reject for unmapped group channels, then mint token with the right active_team |
ai_platform_engineering/integrations/slack_bot/sse_client.py | No change (already forwards Bearer). Drop any place that ever set X-Team-Id (none in chat path; sanity check). |
ui/src/lib/da-proxy.ts | No change. Already forwards Bearer as-is. Verify no X-Team-Id forwarding code exists. |
ai_platform_engineering/dynamic_agents/src/dynamic_agents/auth/jwt_middleware.py | Accept tokens with aud=agentgateway (in addition to its current expected audience). Log sub=, active_team=, aud= on success. |
ai_platform_engineering/dynamic_agents/src/dynamic_agents/services/mcp_client.py | Add log line in _factory: AGW outbound: sub=<sub> active_team=<at> server=<server_id>. Confirm no SA-fallback path exists. |
deploy/agentgateway/config.yaml | Rewrite CEL: replace bare "team_member" in roles checks with jwt.realm_access.roles.contains("team_member:" + jwt.active_team); add explicit __personal__ branch; keep tool_user:*/admin_user short-circuits. Avoid AGW 0.12 CEL forms has(...), in, and .exists(...) against JWT role arrays because the live playground shows they either return false or panic the gateway. |
ai_platform_engineering/knowledge_bases/rag/server/src/server/{rbac.py,restapi.py} | Remove X-Team-Id reads; replace with jwt.active_team claim. |
ui/src/app/api/rag/[...path]/route.ts, ui/src/app/api/rag/kb/[...path]/route.ts | Remove X-Team-Id forwarding. |
docs/docs/specs/098-enterprise-rbac-slack-ui/how-rbac-works.md, docs/docs/security/rbac/*.md | Document the new flow + sequence diagrams |
Tests across slack_bot/tests/, dynamic_agents/tests/, ui/src/**/__tests__/ | Update any test that asserts on X-Team-Id; add tests for active_team end-to-end |
Spike results (Keycloak 26.3.5, run 2026-04-22)β
We ran four experiments against the live keycloak container in the dev
docker-compose stack to determine which Keycloak mechanism can carry
active_team into a token-exchanged JWT.
Setupβ
- Keycloak 26.3.5, realm
caipe, clientcaipe-slack-bot, RFC 8693 token-exchange + impersonation already configured by the existinginit-token-exchange.shscript. - Test user:
admin@example.com(Keycloaksub=dc5460b8-1a56-4b58-a5e0-b163b110aded). - All token-exchange calls used the bot's
client_id+client_secretwithrequested_subject=<test-user-id>.
Resultsβ
| # | Approach | Outcome | Notes |
|---|---|---|---|
| A | "User Session Note" mapper, set the note via session_note:active_team=... request parameter | β Fails | Mapper installs cleanly (HTTP 201). Token mints. active_team claim does not appear. Pure token-exchange does not pass request params into the user session model β session notes are only populated during interactive flows or via the SPI. |
| B | "User Model Attribute" mapper + bot mutates user attribute via Admin API right before mint | β Works | Token contains "active_team": "team-platform-eng". Reliable but racy β concurrent requests for the same user with different teams would interfere unless we serialize on user_id. Adds two extra Admin API round-trips per request. |
| C | "Claims Parameter Token" mapper + OIDC claims=β¦ request parameter | β Fails | Mapper marshals only claims that already have a source (user attributes, roles). It cannot accept arbitrary values from the request. |
| D | One client scope per team named team-<id>, each containing an oidc-hardcoded-claim-mapper that hardcodes active_team=<id>; bot requests scope=team-<id> at mint time | β Works cleanly | Token contains "active_team": "team-platform-eng" only when the matching scope is requested. With include.in.token.scope=false the team name does NOT leak into the scope claim. Stateless; no per-request mutation; no concurrency hazard. |
Edge cases verified for Approach Dβ
| Edge | Result |
|---|---|
Bot requests a scope that doesn't exist (scope=team-nonexistent) | KC silently ignores it, token mints with no active_team claim. Bot must ensure scope exists before requesting it (one-time admin call when a team is created). |
| Bot requests no team scope at all | Token mints with no active_team claim. CEL can either treat absence as __personal__, OR we pre-create a team-personal scope with a hardcoded active_team="__personal__" mapper. Recommended: pre-create the explicit scope for auditability. |
| Same scope requested twice | Idempotent (mapper fires once). |
Bot requests two team scopes (scope=team-A team-B) | Both mappers fire; the resulting active_team claim becomes the value of whichever mapper Keycloak evaluates last (undefined). Bot MUST request exactly one team scope per token-exchange. |
Decisionβ
Adopt Approach D (per-team client scope with hardcoded-claim mapper) for production.
Why D over B (user-attribute):
- D is stateless (no DB writes per request).
- D has zero concurrency hazards.
- D requires only an admin API call when teams are created, not on every chat message.
- B's two extra round-trips on every Slack message (set attr β mint β revert attr) double the Keycloak load and create a window where a parallel request would see the wrong attribute.
Cost of D: the team-management code path (where teams are created and deleted in the admin UI) must also create/delete the matching client scope. This is a small extension to existing CRUD; we can either:
- Embed the call in the BFF team-create endpoint (simpler), or
- Add a Keycloak-sync background job that reconciles teams β scopes (more robust to drift; recommended for production).
Updated implementation plan (delta from original)β
| File | Original plan | Revised plan |
|---|---|---|
init-token-exchange.sh | Add User Session Note mapper on caipe-slack-bot client | (no change to bot client) Add idempotent block that ensures a team-personal client scope exists with a hardcoded mapper active_team="__personal__", and assigns it as an optional scope on the bot client. |
BFF/admin team CRUD (ui/src/app/api/admin/teams/route.ts) | (n/a) | On team create: call KC Admin API to create client scope team-<id> with hardcoded-claim mapper active_team=<id> and assign as optional scope on caipe-slack-bot. On team delete: remove scope. |
slack_bot/utils/obo_exchange.py impersonate_user(...) | Add active_team arg β session note | Add active_team arg β adds scope=team-<active_team> (or team-personal for __personal__) to the form-encoded body. Validation: must be exactly one team scope per call. |
| Everything else | unchanged | unchanged |
Audience handlingβ
The Slack-bot mint uses aud=agentgateway. dynamic-agents'
JwtAuthMiddleware currently expects its own audience. Two options:
- Have DA accept multiple audiences (
["dynamic-agents", "agentgateway"]). - Have the bot mint a token with
aud=["dynamic-agents", "agentgateway"](Keycloak supports multi-aud).
Both are fine; the second is cleaner long-term.
Web UI rolloutβ
The first-team auto-pick decision means UI sessions need to learn the user's
teams at sign-in and inject active_team into the BFF request. Two viable
implementations:
- BFF reads
session.user.teams[0]and re-mints a Keycloak impersonation token per request (matches the Slack flow exactly). - NextAuth refreshes the token at sign-in with
active_teamembedded.
Pick whichever lines up with how the BFF currently obtains the access token.
Dropping X-Team-Id from the RAG pathβ
The RAG team headers were a working contract. Removing them is a behavioural change for any client (including curl scripts and tests) that calls RAG directly. Inventory all callers before deleting the read-side; clients then get migrated in lockstep with the AGW CEL change.
Acceptance criteriaβ
The change is done when all of these are true:
- AgentGateway logs for a Slack-originated tool call show
jwt.sub=<user-uuid>(not the slack-bot service account). - AgentGateway access logs include
active_teamin the structured fields (either parsed from the JWT or echoed in CEL audit). dynamic-agentslog line on tool load readsConnected to 'jira': N tools (sub=<user>, active_team=<team-id>)withN > 0for users in the team.- A test user added to
team-platform-engonly, posting in#team-sre, gets the rejection message from Path B above and no AGW request is made. - A DM to the bot mints a token with
active_team="__personal__"and the user can use any non-team-scoped tool they're entitled to. grep -r "X-Team-Id" .returns zero hits in production paths (test fixtures and spec docs may still reference it for historical context).- AGW CEL contains no bare
"team_member" in roleschecks; every team-scoped rule usesjwt.active_team.
Relatedβ
- Spec 104 β Team-scoped RBAC
- Spec 098 β Enterprise RBAC + Slack UI
- Spec 102 β Comprehensive RBAC tests and completion
- How RBAC works (canonical reference)
- AgentGateway CEL config:
deploy/agentgateway/config.yaml