Roles vs Scopes β How CAIPE RBAC Decides What You Can Do
Audience: Anyone who has heard the words "role" and "scope" thrown around and isn't 100% sure why we have both, or what actually happens when you click "Create team" in the admin UI.
This doc has two layers: a plain-English explanation first, then the precise technical detail. Read the analogy, then the technical section β they describe the same thing.
The Plain-English Versionβ
Imagine CAIPE is a corporate office building. To get something done you need two things at every door:
- A badge that lists what you're qualified to do. You're trained to operate the espresso machine, you have first-aid certification, you're authorized to enter Lab B. These are your roles.
- A name tag that says which team you're representing right now. You might belong to both the Platform team and the Security team. Today you're attending the Security team's standup, so your name tag says "Security." Tomorrow it might say "Platform." This is your active team, set by a scope.
The security guard at every door checks both:
"Does your badge say you can enter Lab B? β And does your name tag say you're representing a team that's allowed in Lab B today? β OK, go in."
If you have the qualification but you're representing the wrong team β denied. If your name tag is right but you don't have the qualification β denied. You need both.
That's the whole model. The rest of this doc is just the technical mapping of those two ideas onto Keycloak.
Why both? Why not just one?β
Because qualifications and team context are different things that change at different rates.
- Qualifications (roles) are about you as a person: "Sri can use the Jira search tool." That's stable; it travels with you.
- Team context (active team scope) is about which hat you're wearing right now: "Sri is acting as a member of the Platform team in this request." That can change request-to-request, especially in Slack where the same user might post in
#platform-eng(Platform team) and#security-ops(Security team) within minutes.
If we tried to bake team context into the role list ("Sri-can-use-Jira-as-Platform", "Sri-can-use-Jira-as-Security"), the role explosion would be quadratic in users Γ teams, and switching teams would mean rewriting your role assignments. With the scope-as-context-tag design, your roles stay constant; only the per-request active_team claim changes.
What is a "slug"?β
A slug is a URL-safe, lowercase, ASCII-only short identifier derived from a human-readable name. Think of it as the team's machine-readable handle.
| Human name | Slug |
|---|---|
Platform Engineering | platform-engineering |
SRE β On Call | sre-on-call |
π Rocket Squad | rejected β produces empty slug |
The rules (enforced by isValidTeamSlug and deriveSlug in ui/src/app/api/admin/teams/route.ts):
- Lowercase letters, digits, and hyphens only
- No leading or trailing hyphens
- Maximum 63 characters
- Must produce a non-empty result after stripping non-ASCII
We use slugs (not display names) inside Keycloak and JWT claims for three reasons:
- Stable identifier. A team can be renamed in MongoDB without breaking every JWT in flight or every OpenFGA relationship tuple.
- Safe in URLs, headers, and identifiers. Keycloak client scope names, role names, and JWT claim values shouldn't contain spaces or Unicode.
- One canonical form. No ambiguity between "Platform Eng" / "platform-eng" / "platform engineering" β the slug is the only string the system actually uses for matching.
The display name is a UI concern; the slug is the identity.
What is a Keycloak "client scope"?β
In OpenID Connect, a client is anything that asks Keycloak for a token (CAIPE UI, Slack bot, AgentGateway). A scope is a named bundle of claims β when a client requests a token "with scope X," Keycloak attaches whatever claims scope X is configured to inject.
In stock OIDC, scopes usually look like openid (gives you sub), email (gives you email), profile (gives you given_name, family_name, etc.). The pattern is: the client asks for the scope, the scope adds claims to the token.
We use that exact mechanism for team context. We define one client scope per team:
team-personalβ when present, injectsactive_team=__personal__into the JWT.team-platform-engβ injectsactive_team=platform-eng.team-security-opsβ injectsactive_team=security-ops.- β¦one per team.
Each scope is just a Keycloak object with a single protocol mapper of type oidc-hardcoded-claim-mapper configured to write active_team=<slug> into the access token. The scope itself has no permission semantics β it's purely a claim-injection vehicle.
Default vs optional client scopesβ
A client (e.g. agentgateway, caipe-slack-bot) can have client scopes attached two ways:
- Default scope β always added to every token issued for that client.
- Optional scope β only added when the client explicitly requests it via the
scopeparameter.
We bind team-<slug> scopes as default scopes on the CAIPE_PLATFORM_AUDIENCE client (caipe-platform by default) because Keycloak's RFC 8693 token-exchange (used in OBO β On-Behalf-Of) silently drops the scope request parameter. The audience client's default scopes are the only reliable way to inject the claim during token-exchange.
The known caveat (documented in architecture.md around line 625): with multiple team-<slug> scopes all bound as defaults, every hardcoded mapper fires on every token, and the last one wins (Keycloak does not guarantee mapper ordering). We compensate by having the Slack bot's OBO module verify the returned JWT's active_team claim matches what was requested β mismatch raises OboExchangeError. Follow-up work tracked in Spec 104 is to switch to a script-mapper that reads the requested team from a custom request parameter rather than per-team default scopes.
Why do we still need OpenFGA membership if the JWT already has active_team?β
Excellent question β and this is the security crux of the design.
The active_team claim in the JWT only says what team this token claims to represent. It does not prove the user actually belongs to that team. The claim is injected by a hardcoded mapper that doesn't check anything.
So if we trusted active_team alone, an attacker (or a buggy client) who could trigger a token-exchange with scope=team-finance-prod would get a token claiming active_team=finance-prod β even if they've never been added to the Finance team.
The OpenFGA tuple user:<sub> member team:<slug> is the proof of membership. It is written by team membership APIs and identity-group sync. Unlike the scope-injected claim, this tuple is stored server-side in the authorization graph and cannot be self-asserted by manipulating the token request.
AgentGateway now delegates the gateway decision to OpenFGA through ext_authz.
The equivalent relationship facts are represented as tuples:
user:<sub> member team:<slug>
team:<slug>#member can_call tool:<tool-or-prefix>
The active_team claim says "this request is acting as team X." OpenFGA and the team-membership source store now say "this user is actually a member of team X." The conjunction is what makes the system safe.
Defense-in-depth recapβ
| Layer | What it asserts | Who controls it | Can it be forged? |
|---|---|---|---|
active_team=<slug> claim | "This request is on behalf of team <slug>" | The OBO token-exchange request (via scope binding) | Indirectly β but the slack-bot verifies the returned claim matches the requested scope, raising OboExchangeError on mismatch |
user:<sub> member team:<slug> tuple | "OpenFGA records this user as a member of team <slug>" | Team membership APIs and identity-group sync | No β it is checked server-side in OpenFGA |
team:<slug>#member can_call tool:<tool> tuple | "Members of this team may invoke <tool>" | Team Resources / ReBAC policy authoring | No β OpenFGA evaluates stored tuples, not client-provided claims |
Compromise any one and you still don't get in. That's the point.
What happens to team_member:<slug> now?β
team_member:<slug> is now a temporary compatibility role, not the source of truth for new writes. Unlike active_team (which is a hardcoded-claim mapper on a team-<slug> client scope), team_member:<slug> is just a standard Keycloak realm role. Existing assignments may still appear in realm_access.roles, but new manual team membership writes no longer create or assign the role.
So the question has two parts.
What is written when a user joins a team?β
When an admin adds the user to a team via POST /api/admin/teams/<id>/members, the handler at ui/src/app/api/admin/teams/[id]/members/route.ts:
- Updates MongoDB
teams.membersfor the product UI. - Resolves the Keycloak
subfor the user's email when the user exists. - Stores a manual team-membership source record.
- Writes an OpenFGA tuple:
user:<sub> member team:<slug>.
The mirror happens on remove: after the last active source is gone, the route deletes the OpenFGA membership tuple. It does not remove or create Keycloak team_member:<slug> assignments.
Existing team_member:<slug> roles can remain until every older consumer is gone, but they should not be used for new authorization design.
What appears in the JWT?β
New access is represented in OpenFGA, not in realm_access.roles. The next token issued for the user still carries coarse roles such as chat_user or admin_user, plus the signed active_team claim when a team context is requested. It does not need a new resource or team-membership role to make the OpenFGA check work.
- Web UI (
caipe-platform) β on the next NextAuth login, or on the next refresh-token grant. Practically: the user sees the new role after their session refreshes (or after they log out and back in). - Slack bot (
caipe-slack-bot) β on the next OBO token-exchange. The bot does a fresh exchange on every Slack message and doesn't cache OBO tokens across requests, so the role appears on the user's next Slack message. - Other flows β on the next
client_credentialsor password grant for that user.
There is no cached realm-role list in the clients. Keycloak reads realm_access.roles from its database at token-mint time, but OpenFGA relationship updates take effect as soon as the tuple write succeeds.
Summary tableβ
| Step | What happens | Where |
|---|---|---|
| 1. Admin clicks "Add member" in team UI | POST /api/admin/teams/<id>/members | ui/src/app/api/admin/teams/[id]/members/route.ts |
2. Mongo teams.members updated | source of truth for the admin UI | teams collection |
| 3. Keycloak user is resolved by email | gives the stable sub used in OpenFGA subjects | ui/src/lib/rbac/keycloak-admin.ts |
| 4. OpenFGA membership tuple is written | user:<sub> member team:<slug> | ui/src/app/api/admin/teams/[id]/members/route.ts |
| 5. Next team-scoped token mint | Keycloak signs active_team=<slug> when requested | Keycloak client scope mapper |
| 6. AgentGateway ext_authz evaluates | OpenFGA checks the subject/resource tuple graph | deploy/openfga/bridge/main.py |
Are team memberships embedded in the token?β
Not for new writes. Existing team_member:<slug> compatibility roles may still appear in realm_access.roles, but the authoritative membership set now lives in OpenFGA and the team-membership source store.
A user who belongs to teams platform-eng, security-ops, and infra should have a compact JWT like:
{
"iss": "http://localhost:7080/realms/caipe",
"sub": "...",
"aud": "agentgateway",
"realm_access": {
"roles": [
"chat_user"
]
},
"active_team": "platform-eng"
}
Only one active_team claim is present β the one corresponding to the team the bot requested via the team-<slug> scope on this specific token-exchange. The fact that the user belongs to platform-eng is stored as an OpenFGA tuple:
user:<sub> member team:platform-eng
Why this is correct (and intentional)β
The two pieces of information have different lifetimes and different semantics:
| Property | What it represents | When it changes | Cardinality on a token |
|---|---|---|---|
| OpenFGA membership tuples | "The relationship graph records which teams this user belongs to" | When team membership APIs or identity sync reconcile | 0 in the token β stored in OpenFGA |
active_team claim | "Which team this single request is on behalf of" | On every token-exchange (every Slack message, potentially) | 1 β exactly one value |
The OpenFGA relationship check at AgentGateway is what reduces N memberships to a single allow/deny decision:
check(user:<sub>, can_call, tool:<tool>)
The selected team context is the key bit. OpenFGA doesn't ask "is the user a member of any team?" β it asks whether the user, through the current team/resource relationship graph, can call this specific tool. The active_team claim selects which of the user's memberships is being exercised; the relationship tuples are the universe of what the user could exercise.
Implicationsβ
-
Token size does not grow with team count. Membership fan-out is stored in OpenFGA instead of
realm_access.roles. -
The "last mapper wins" caveat is only about
active_team. With multipleteam-<slug>scopes bound as defaults on theCAIPE_PLATFORM_AUDIENCEclient, every hardcoded mapper fires on every token and the last one wins (Keycloak does not guarantee mapper ordering). The slack-bot'sobo_exchange.impersonate_userverifies the returnedactive_teammatches the requested team and raisesOboExchangeErroron mismatch. -
AGW gets the right answer from OpenFGA. If Keycloak minted a wrong
active_team, the OpenFGA check would evaluate the wrong team context. The bot's pre-flight check (requested != returned -> reject) is what prevents that privilege confusion. -
There is no CEL or role-array iteration. AgentGateway delegates to OpenFGA through
ext_authz, and OpenFGA checks stored tuples.
Mental model in one lineβ
OpenFGA says what hats you own. The
active_teamclaim says which hat you're wearing right now. AGW checks the relationship graph for that selected context.
Roles vs Scopes β the technical referenceβ
Roles β what a caller is allowed to doβ
Realm roles (carried in jwt.realm_access.roles) are assigned to users. They answer "is this user permitted?"
| Tier | Examples | Meaning |
|---|---|---|
| Coarse identity | chat_user, admin_user | Default user / global admin. admin_user bypasses every tool/agent/team gate. |
| Resource-scoped (legacy compatibility) | tool_user:jira_search_issues, tool_user:jira_*, tool_user:*, agent_user:test-april-2025, agent_admin:test-april-2025 | Transitional JWT role shape. New grants are OpenFGA relationships, not new Keycloak realm roles. |
| Team membership (legacy compatibility) | team_member:<slug> | Older JWT role shape for membership. New writes use OpenFGA user:<sub> member team:<slug> tuples. |
Materialization: Team membership and Team Resources now write OpenFGA tuples. Keycloak remains responsible for coarse bootstrap/global roles and the active_team client-scope claim.
Scopes β which team context the token representsβ
Client scopes are not permissions. They are per-team claim injectors that determine the value of the active_team claim in the issued JWT.
| Scope | active_team claim value | Bound to | Bound how |
|---|---|---|---|
team-personal | __personal__ | caipe-slack-bot | optional (provisioned by the realm-init script on every boot) |
team-<slug> | <slug> | caipe-platform (CAIPE_PLATFORM_AUDIENCE) | default (auto-created on team creation, plus startup auto-sync for pre-existing teams) |
team-<slug> | <slug> | caipe-slack-bot | optional (for code symmetry with team-personal) |
Why we need scopes at all: the active_team value has to be signed into the JWT itself so AgentGateway and Dynamic Agents can trust it without a callback to MongoDB. Keycloak's RFC 8693 token-exchange silently drops the scope request parameter, so the audience client's default scopes are the only reliable injection path.
How they combine β the actual gate at AgentGatewayβ
A typical relationship check for a tool call:
check(user:<sub>, can_call, tool:<name>)
A request is allowed only if all three line up:
- Identity facts say who the caller is (
sub, email, realm roles). - OpenFGA tuples say the caller is related to the team/resource being used.
- Scope caused the JWT to carry
active_team=<slug>when a team context is required.
If a user belongs to teams A and B, the Slack bot picks the team via OBO β obo_exchange.impersonate_user(active_team=...) adds scope=openid team-<slug> to the exchange β and then verifies the returned JWT's active_team claim matches. Mismatch raises OboExchangeError (load-bearing security invariant).
What happens when you click "Create team"?β
End-to-end flow when an admin POSTs to /api/admin/teams (ui/src/app/api/admin/teams/route.ts):
- Validate input. Derive
slugfromnameif not provided; reject if the slug is invalid or already in use. - Insert team document into MongoDB (
teamscollection) with the creator asowner. - Call
ensureTeamClientScope(slug)(ui/src/lib/rbac/keycloak-admin.ts), which idempotently:- Creates a Keycloak client scope named
team-<slug>(POST /client-scopes). - Adds an
oidc-hardcoded-claim-mapperto the scope, configured to injectactive_team=<slug>into the access token. - Binds the scope as a default scope on the
CAIPE_PLATFORM_AUDIENCEclient (caipe-platformby default). - Binds the scope as an optional scope on the
caipe-slack-botclient.
- Creates a Keycloak client scope named
- If scope provisioning fails, the Mongo insert is rolled back. We never want a team without its scope, because that team's channels would silently fail OBO token-exchange.
What is not done at team creation:
- No realm role is created. No
team_member:<slug>role is auto-created at team-creation time, and new membership writes do not create one either. - No client roles are created.
- No users are auto-assigned to anything. The team has zero members until MongoDB membership and OpenFGA tuples are written.
TL;DR mental modelβ
- Role = capability. Granted to users. Answers "can this human do X?"
- Scope = context tag. Bound to clients. Answers "which team is this token speaking on behalf of?"
- Slug = the team's machine name. Lowercase, hyphenated, ASCII. Used everywhere internal.
- Team creation = new client scope (so a JWT can be minted with
active_team=<slug>). - Team membership = OpenFGA tuple
user:<sub> member team:<slug>. - Tool/agent permission = OpenFGA tuple such as
team:<slug>#member can_call tool:<id>orteam:<slug>#member can_use agent:<id>.
Entity diagram β how roles, scopes, JWTs, and resources relateβ
The data model in one picture. Everything below the dashed lines is owned by Keycloak; the diagram intentionally omits the IdP itself, the JWKS publishing channel, and the OpenFGA bridge code (those live in the enforcement diagram below) so the entities you can point at in admin or in a JWT payload stand out.
How to read itβ
- A
USERis grantedREALM_ROLEs directly only for coarse/bootstrap behavior such asadmin_userorchat_user. - A
USERis a member of aTEAMthrough an OpenFGA tupleuser:<sub> member team:<slug>. - A
TEAMhas a 1:1CLIENT_SCOPEnamedteam-<slug>whose only job is to injectactive_team=<slug>into JWTs minted with that scope. The team and the scope are paired but live in different Keycloak collections (teams are in MongoDB; scopes are Keycloak objects). - A
JWTcarries (a) the user's full role list atrealm_access.rolesand (b) at most oneactive_teamclaim sourced from whichever team scope was bound to the request. TOOLandAGENTare resource entities in OpenFGA. Team/resource relationships, not role names, grant access.
Why both OpenFGA membership AND active_team?β
This is the security crux and the only "redundancy" in the model worth keeping. OpenFGA stores who belongs to the team, while the JWT claim is the request's assertion of which team context is active. Both are needed so a caller cannot grant themselves team access by manipulating the token request.
What's deliberately not in this diagramβ
| Thing | Why it's not here | Where it does live |
|---|---|---|
| Keycloak (the IdP itself) | It owns every other entity above; drawing it as a node adds clutter without information. | Implicit. Mentioned in Component 1. |
| JWKS / signing keys | Cryptographic plumbing β concerns key rotation, not data shape. | Workflows β JWT validation. |
| AgentGateway ext_authz / per-agent auth code | Those are enforcement code, not data. | The enforcement diagram below, deploy/agentgateway/config.yaml, and deploy/openfga/bridge/main.py. |
OIDC clients (caipe-platform, caipe-slack-bot, agentgateway) | The client mostly affects audience and which scopes are bound by default, not the per-request decision shape. | Component 1 β OIDC clients table. |
ROLE_ASSIGNMENT / TEAM_MEMBERSHIP join tables | These are mechanical M:N joins that the ER notation already represents with the relationship line. | n/a. |
Enforcement β how decisions actually get made (PEP & PDP)β
The data model says what facts exist. Decisions are made by separate components that consume the JWT.
AgentGateway is the MCP PEP: it validates the Keycloak JWT locally and then calls the OpenFGA bridge through Envoy-compatible ext_authz. Dynamic Agents still enforce per-agent route checks before forwarding user tokens downstream. Both depend on the JWT being trustworthy (signed by Keycloak, verified against JWKS), but gateway tool authorization is now relationship-backed rather than CEL-authored.
What can each caller actually do?β
Three entry points, three slightly different flows, but all converge on the same JWT-based PEP at AgentGateway.
1. Web UI userβ
Key point about the Web UI: today the UI client (caipe-platform) doesn't bind any team-<slug> scope, so its JWTs usually ship without an active_team claim. Web UI authorization is therefore mediated by Web UI backend route gates and OpenFGA relationship checks rather than by editing AgentGateway rules.
2. Slack user posting in a channelβ
Key point about Slack channels: the channel ID determines the team via MongoDB (channel_team_resolver), and the bot pre-checks team membership before OBO. AgentGateway then checks the OpenFGA tuple graph for the resulting user/team/resource relationship.
3. Slack user in a DM (1:1 with the bot)β
Key point about DMs: the __personal__ sentinel says "no channel team β this user is acting as themselves." AgentGateway still delegates the decision to OpenFGA, but the bridge evaluates user-scoped relationships instead of requiring a channel team membership.
Side-by-side: who needs what?β
| Caller | active_team claim | Required roles to invoke tool_user:<X> | Source of team context |
|---|---|---|---|
| Web UI user (non-admin) | β absent | Matching OpenFGA relationship, usually via Web UI backend-mediated resource access | (no team context today; Spec 104 follow-up) |
| Web UI user (admin) | β absent | admin_user (bypasses everything) | n/a β global admin |
| Slack channel user | β
<team-slug> | tool_user:<X> AND team_member:<team-slug> | MongoDB channel_team_resolver (channel ID β team slug) |
| Slack DM user | β
__personal__ | tool_user:<X> only | sentinel __personal__ (no team) |
| Slack admin in any channel/DM | varies | admin_user (bypasses everything) | sentinel or team slug, but bypassed |
Cross-referencesβ
- Architecture β Component 1 (Keycloak) β full role/scope tables, env vars, IdP brokering.
- Architecture β Spec 104
active_teamsection β what changed, components touched, failure modes. - Spec 104 β team-scoped RBAC β original design doc.
- Workflows β OBO token-exchange β sequence diagrams for how scopes turn into JWT claims at runtime.
- File map β find the source file that owns any piece of the auth path.