Secret bootstrap: dev, K8s Secrets, and External Secrets Operator (ESO)
Audience: platform engineers installing CAIPE with Helm. If you're running
docker-compose.dev.yamlyou can skip this β the.envfile already covers everything.
For the full RBAC/OpenFGA Kubernetes install and upgrade sequence, including which services are charted today and which services remain external prerequisites, start with the Helm installation and upgrade guide.
The Keycloak subchart needs five secrets to bootstrap a clean realm:
| Secret (in cluster) | Keys | Used by |
|---|---|---|
<release>-keycloak-admin | username, password | both init Jobs (Keycloak Admin REST API) |
<release>-keycloak-idp | IDP_CLIENT_SECRET | init-idp Job (configures upstream IdP broker, e.g. Okta/Duo) |
<release>-keycloak-ui-client | OIDC_CLIENT_SECRET | init-idp / auth reconcile Jobs (reconciles Keycloak caipe-ui) |
<release>-keycloak-platform-client | OIDC_CLIENT_SECRET | init-idp / auth reconcile Jobs (reconciles Keycloak caipe-platform supervisor) |
<release>-keycloak-bot | KC_BOT_CLIENT_SECRET | init-token-exchange Job and the slack-bot deployment |
What is
caipe-platform? It's the confidential OIDC client the supervisor uses for two flows: (1)client_credentialstokens for internal service-to-service calls, and (2) the target audience for on-behalf-of token-exchange from the bots. The realm import ships a dev placeholder (caipe-platform-dev-secret) so first-boot import works without external state β but that placeholder is plaintext-visible in the rendered realm ConfigMap and must be replaced for any deployment that is not local dev/CI. See Migrating from the dev placeholder (caipe-platform) below.
There are three install paths β pick the one that matches your environment. All three use the same helm chart; they differ only in how the underlying K8s Secrets are produced.
ββββββββββββββββ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
β (1) DEV β β (2) PROD with K8s Secretsβ β (3) PROD with ESO β
β β β β β β
β helm-managed β β kubectl create secret β¦ β β ExternalSecret pulls fromβ
β random pwds β β then point chart at them β β Vault/AWS-SM/GCP-SM/β¦ β
β β β β β β
β NO external β β NO external dependency β β ESO controller required β
β dependency β β β β + your secrets backend β
ββββββββββββββββ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
Path 1 β DEV (helm-managed random passwords)β
Best for sandbox/PoC clusters. Do not use in production.
# values.yaml
keycloak:
admin:
username: admin
# password: "" # leave empty β 32-char random, kept across upgrades
idp:
enabled: false # local Keycloak users only β no upstream IdP
tokenExchange:
enabled: true
# botClientSecret: "" # leave empty β 32-char random
helm install caipe ./charts/ai-platform-engineering -n caipe --create-namespace
The chart will:
- Require a stable Keycloak admin Secret source (
admin.password,admin.secretRef, oradmin.externalSecret). The chart no longer generates random admin passwords because Keycloak stores the bootstrap password hash in its database. - Generate
caipe-keycloak-botSecret withKC_BOT_CLIENT_SECRET= random 32-char. - Run
init-token-exchangeJob which PUTs that secret to Keycloak, so the slack-bot pod and the Keycloakcaipe-slack-botclient share one value end-to-end.
For disposable dev installs, set keycloak.admin.password explicitly and store
it somewhere recoverable. For shared environments, prefer admin.secretRef or
admin.externalSecret.
To read a manually-managed admin password later:
kubectl get secret caipe-keycloak-admin -n caipe -o jsonpath='{.data.password}' | base64 -d
Why no generated admin password? Keycloak consumes the admin password only when bootstrapping the initial user. Later upgrades must keep the Kubernetes Secret aligned with the password hash in Keycloak's database, or init jobs will receive
401from the master realm.
Path 2 β PROD with manually-managed K8s Secretsβ
Best when you have a Secret-injection sidecar (Vault Agent, csi-secrets-store)
or just a CI pipeline that kubectl applys Secrets out-of-band.
Step 1: create the Secretsβ
kubectl -n caipe create secret generic caipe-keycloak-admin \
--from-literal=username=admin \
--from-literal=password="$(openssl rand -hex 32)"
# Only if upstream IdP enabled
kubectl -n caipe create secret generic caipe-keycloak-idp \
--from-literal=IDP_CLIENT_SECRET="<value-from-okta-app>"
kubectl -n caipe create secret generic caipe-keycloak-ui-client \
--from-literal=OIDC_CLIENT_SECRET="<same-secret-used-by-caipe-ui>"
kubectl -n caipe create secret generic caipe-keycloak-platform-client \
--from-literal=OIDC_CLIENT_SECRET="$(openssl rand -hex 32)"
kubectl -n caipe create secret generic caipe-keycloak-bot \
--from-literal=KC_BOT_CLIENT_SECRET="$(openssl rand -hex 32)"
Step 2: point the chart at themβ
keycloak:
admin:
secretRef: caipe-keycloak-admin # chart will NOT create one
idp:
enabled: true
alias: okta
displayName: "Okta SSO"
issuer: "https://example.okta.com"
clientId: caipe-okta
accessGroup: ""
adminGroup: ""
secretRef: caipe-keycloak-idp
uiClient:
secretRef: caipe-keycloak-ui-client
platformClient:
secretRef: caipe-keycloak-platform-client
tokenExchange:
enabled: true
secretRef: caipe-keycloak-bot
# Wire slack-bot to the SAME bot Secret β single source of truth.
slack-bot:
oauth2:
clientSecretFromSecret:
name: caipe-keycloak-bot
key: KC_BOT_CLIENT_SECRET
helm install caipe ./charts/ai-platform-engineering -n caipe -f values.yaml
Path 3 β PROD with External Secrets Operator (ESO)β
Best when secrets live in Vault / AWS Secrets Manager / GCP Secret
Manager / Azure Key Vault, etc. and you want the chart itself to
reconcile them into K8s without anyone running kubectl create secret.
Prerequisitesβ
-
external-secrets-operator installed in the cluster.
-
A
ClusterSecretStore(orSecretStore) configured for your backend. Examples below assume a store namedvault-backend. You can verify with:kubectl get clustersecretstore vault-backend -o yaml
values.yamlβ
keycloak:
externalSecretsApiVersion: v1beta1 # bump when ESO promotes to v1
admin:
username: admin # username is non-sensitive β OK in values
externalSecret:
enabled: true
refreshInterval: "1h"
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRefs:
username:
key: secret/data/prod/keycloak
property: admin_username
password:
key: secret/data/prod/keycloak
property: admin_password
idp:
enabled: true
alias: okta
displayName: "Okta SSO"
issuer: "https://example.okta.com"
clientId: caipe-okta
accessGroup: ""
adminGroup: ""
externalSecret:
enabled: true
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRef:
key: secret/data/prod/keycloak
property: idp_client_secret
uiClient:
externalSecret:
enabled: true
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRef:
key: secret/data/prod/caipe-ui
property: oidc_client_secret
platformClient:
externalSecret:
enabled: true
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRef:
key: secret/data/prod/caipe-platform
property: oidc_client_secret
tokenExchange:
enabled: true
externalSecret:
enabled: true
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRef:
key: secret/data/prod/keycloak
property: bot_client_secret
# Single source of truth: slack-bot pulls OAUTH2_CLIENT_SECRET from
# the SAME ESO-managed Secret that init-token-exchange wrote into Keycloak.
slack-bot:
oauth2:
clientSecretFromSecret:
name: caipe-keycloak-bot
key: KC_BOT_CLIENT_SECRET
Installβ
helm install caipe ./charts/ai-platform-engineering -n caipe -f values.yaml
ESO will:
- See the three
ExternalSecretCRs the chart emits. - Reconcile each into a K8s Secret with the right name & keys.
- The init Jobs (helm post-install hooks) wait for the Secrets and then
run β
init-token-exchange.shPUTsKC_BOT_CLIENT_SECRETto Keycloak so the value lives in exactly one place: your secrets backend.
Rotationβ
# In Vault:
vault kv put secret/prod/keycloak \
bot_client_secret="$(openssl rand -hex 32)"
# Within `refreshInterval` (1h above), ESO updates the K8s Secret.
# Then trigger init-token-exchange to push the new value to Keycloak:
helm upgrade caipe ./charts/ai-platform-engineering -n caipe -f values.yaml
# (or `kubectl create job --from=cronjob/...` if you've cronified the Job)
# Bot pods need to be restarted to pick up the new env var.
# Add stakater/Reloader annotations to the slack-bot deployment if you
# want this fully automatic on Secret change.
Reference: which template owns which Secret?β
charts/ai-platform-engineering/charts/keycloak/templates/
βββ secret.yaml # admin Secret (literal) β only when no secretRef AND no ESO
βββ external-secret.yaml # admin ExternalSecret β when admin.externalSecret.enabled
β # OR legacy externalSecrets.enabled
βββ idp-external-secret.yaml # idp ExternalSecret β when idp.externalSecret.enabled
βββ ui-client-external-secret.yaml # caipe-ui ExternalSecret β when uiClient.externalSecret.enabled
βββ platform-client-external-secret.yaml # caipe-platform ExternalSecret β when platformClient.externalSecret.enabled
βββ bot-secret.yaml # bot Secret (random) β when no secretRef AND no ESO
βββ bot-external-secret.yaml # bot ExternalSecret β when tokenExchange.externalSecret.enabled
βββ job-init-idp.yaml # consumes admin + (optional) idp/ui/platform Secrets
βββ job-auth-reconcile.yaml # PreSync/post-install reconcile β same env vars as init-idp
βββ job-init-token-exchange.yaml # consumes admin + (optional) bot Secret; PUTs bot secret to Keycloak
The _helpers.tpl defines keycloak.adminSecretName, keycloak.idpSecretName,
keycloak.uiClientSecretName, keycloak.platformClientSecretName, and
keycloak.botSecretName β all five honor their *.secretRef override so an
external Secret name flows through all consumers without duplication.
Migrating from the dev placeholder (caipe-platform)β
The caipe-platform confidential client in realm-config.json ships with
the literal placeholder caipe-platform-dev-secret. That value was β and
still is, for local dev β perfectly fine: docker-compose stacks rely on
it, the integration test matrix relies on it, and Helm's first-boot import
needs something in the realm JSON for the client to be created.
In production, however, the rendered realm ConfigMap is plaintext. Anyone
with kubectl get cm -o yaml permission in the Keycloak namespace can
read the dev secret and mint client_credentials tokens for the
supervisor's audience β exactly the same risk class we already mitigated
for caipe-ui and the bot OBO clients.
The init-idp (post-install) and auth-reconcile (PreSync) Jobs now
re-read KEYCLOAK_PLATFORM_CLIENT_SECRET on every run and PUT it to
the caipe-platform client via the Admin API, so the real secret never
lives in a ConfigMap. The script behaves exactly like the existing
caipe-ui reconciliation:
- env var unset β no change, dev placeholder stays in place (current behaviour);
- env var set β reconciles on every Job run, idempotent against the same value, transparent rotation on change.
Existing installsβ
Pick one of the two paths below. Both produce the same end-state and can be rolled out without downtime β the supervisor reads the secret from a mounted Secret, not from Keycloak, so the cutover is just:
- Decide the new secret value (or reuse what your existing supervisor pod is already configured with β see Pre-flight below).
- Make sure the supervisor's
KEYCLOAK_CLIENT_SECRETand the new Keycloak Secret hold the same value. helm upgradeβauth-reconcileruns first (PreSync) and pushes the value into Keycloak.- The placeholder in the realm ConfigMap becomes inert (Keycloak only reads the realm import on a fresh database).
Pre-flight: what secret is the supervisor currently using?β
If your supervisor is running, check what it thinks the client_secret is. The simplest, safest check is:
# Resolve via whatever variable / secret the supervisor reads.
# In the default chart, that's KEYCLOAK_CLIENT_SECRET on the
# supervisor (caipe-platform) deployment.
kubectl -n caipe get deploy ai-platform-engineering-supervisor \
-o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="KEYCLOAK_CLIENT_SECRET")]}'
If the result references caipe-platform-dev-secret, you are running
on the placeholder today β start from a fresh random value below.
Otherwise reuse the existing value so no other pod restarts are needed.
Path A β manually managed K8s Secretβ
# 1. Pick (or reuse) the real client secret.
PLATFORM_CLIENT_SECRET="$(openssl rand -hex 32)" # or: existing value
# 2. Create the K8s Secret the init Jobs will read.
kubectl -n caipe create secret generic caipe-keycloak-platform-client \
--from-literal=OIDC_CLIENT_SECRET="${PLATFORM_CLIENT_SECRET}"
# 3. (If different) update the supervisor's own client_secret source
# to the SAME value, so no pod is mismatched after step 4.
# e.g. if the supervisor reads from caipe-platform-secret:
kubectl -n caipe create secret generic caipe-platform-secret \
--from-literal=KEYCLOAK_CLIENT_SECRET="${PLATFORM_CLIENT_SECRET}" \
--dry-run=client -o yaml | kubectl apply -f -
Add to your values.yaml:
keycloak:
platformClient:
secretRef: caipe-keycloak-platform-client
Run the upgrade:
helm upgrade caipe ./charts/ai-platform-engineering -n caipe -f values.yaml
The auth-reconcile Job logs
[init-idp] Reconciling client_secret on 'caipe-platform' from KEYCLOAK_PLATFORM_CLIENT_SECRET ...
[init-idp] caipe-platform client_secret reconciled.
If you see KEYCLOAK_PLATFORM_CLIENT_SECRET not set β leaving caipe-platform client_secret unchanged, the env var did not propagate β check
platformClient.secretRef or platformClient.externalSecret.enabled is set,
re-render with helm template β¦ | grep KEYCLOAK_PLATFORM_CLIENT_SECRET, and
confirm the Secret exists in the same namespace as the Job.
Path B β External Secrets Operatorβ
Store the value in your secrets backend (Vault / AWS-SM / GCP-SM / Azure Key Vault), then enable the chart-emitted ExternalSecret:
keycloak:
platformClient:
externalSecret:
enabled: true
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
remoteRef:
key: secret/data/prod/caipe-platform
property: oidc_client_secret
ESO reconciles <release>-keycloak-platform-client from the backend; the
init Jobs read it and push it to Keycloak on the next helm upgrade /
ArgoCD sync.
New installsβ
Just include platformClient.secretRef (or platformClient.externalSecret)
from day one β the dev placeholder will be overwritten on the very first
init-idp Job, before any supervisor pod has ever produced a token.
Rotationβ
# 1. Write the new value into Vault (or whichever backend).
vault kv put secret/prod/caipe-platform oidc_client_secret="$(openssl rand -hex 32)"
# 2. Within ESO's refreshInterval the K8s Secret is updated.
# 3. Re-run init-idp / auth-reconcile so Keycloak picks up the new value.
helm upgrade caipe ./charts/ai-platform-engineering -n caipe -f values.yaml
# 4. Restart the supervisor pod so it reads the new value (or use
# stakater/Reloader on the supervisor deployment for hands-off rotation).
kubectl -n caipe rollout restart deploy/ai-platform-engineering-supervisor
The order matters: always rotate Keycloak first, then the consumer. The init Job's PUT is atomic from the Admin API's perspective; in-flight tokens issued under the previous secret remain valid until expiry.
Verifying the reconcile path locallyβ
A throwaway end-to-end test ships in the repo. It boots Keycloak 26.3 in
Docker, imports the chart's realm-config.json, runs init-idp.sh with
and without KEYCLOAK_PLATFORM_CLIENT_SECRET set, and asserts that:
- the dev placeholder is in place after import,
- the env-unset path is a no-op,
- the env-set path reconciles, is idempotent, and the new secret mints
client_credentialstokens, - the old placeholder is rejected after rotation (no leftover acceptance of the dev value).
# Default port is 18080; override if it clashes with your local stack:
make test-keycloak-reconcile # uses port 28080
KC_PORT=29080 make test-keycloak-reconcile # custom port
KEEP=1 KC_PORT=29080 ./tests/integration/test_keycloak_platform_client_reconcile.sh
# (KEEP=1 leaves the Keycloak container running for manual poking)
Runs in ~16 seconds on a warm cache. Wired into CI via
.github/workflows/ci-keycloak-init.yml (reconcile-test job) and blocks
the release-notification path so a broken reconcile fails the build.
Rollbackβ
If anything goes wrong you can fall back to the dev placeholder without re-deploying Keycloak:
# Roll back the chart values to remove platformClient.secretRef /
# externalSecret, then run init-idp by hand:
KC_TOKEN=$(curl -sf -X POST "$KC_URL/realms/master/protocol/openid-connect/token" \
-d "grant_type=password&client_id=admin-cli&username=admin&password=$ADMIN_PWD" \
| jq -r .access_token)
CLIENT_UUID=$(curl -sf -H "Authorization: Bearer $KC_TOKEN" \
"$KC_URL/admin/realms/caipe/clients?clientId=caipe-platform" | jq -r '.[0].id')
curl -sf -X PUT \
-H "Authorization: Bearer $KC_TOKEN" \
-H "Content-Type: application/json" \
"$KC_URL/admin/realms/caipe/clients/$CLIENT_UUID" \
-d '{"clientId":"caipe-platform","secret":"caipe-platform-dev-secret"}'
(This is exactly what init-idp.sh does when
KEYCLOAK_PLATFORM_CLIENT_SECRET is empty and the realm has just been
freshly imported.)
Production hardening β strict client-secret modeβ
TL;DR: Set
keycloak.strictClientSecrets: truein your production values. The init Jobs will then fail loudly if Keycloak still accepts any of the four dev placeholder client_secrets, instead of silently leaving them live.
The realm import in charts/.../keycloak/realm-config.json ships four
confidential clients with dev-only placeholder secrets so the realm can
import cleanly on first boot:
| Client | Placeholder secret | Reconciled by | Real source |
|---|---|---|---|
caipe-ui | caipe-ui-dev-secret | init-idp.sh | uiClient.* |
caipe-platform | caipe-platform-dev-secret | init-idp.sh | platformClient.* |
caipe-slack-bot | caipe-slack-bot-dev-secret | init-token-exchange.sh | tokenExchange.* |
caipe-webex-bot | caipe-webex-bot-dev-secret | init-token-exchange.sh | webexTokenExchange.* |
Each reconciliation helper is opt-in: it only PUTs a new value to
Keycloak when the matching env var is present in the Job. If an operator
forgets to set the matching secretRef / externalSecret, the
reconciliation silently logs "leaving β¦ unchanged" β and Keycloak
keeps accepting the dev placeholder forever.
What strict mode doesβ
strictClientSecrets: true adds a final guard to each init script:
init-idp.shissues aclient_credentialsrequest to Keycloak using the dev placeholders forcaipe-uiandcaipe-platform.init-token-exchange.shdoes the same forcaipe-slack-botandcaipe-webex-bot.
If Keycloak returns an access_token for any of them, the Job logs:
[init-idp] ERROR: Keycloak still accepts the dev placeholder client_secret for 'caipe-ui'.
[init-idp] Set the matching secretRef or externalSecret for this client and retry.
[init-idp] Strict mode FAILED: 1 client(s) still accept dev placeholder secrets.
β¦and exits non-zero, which fails the Helm install / GitOps sync. The
operator has to fix the missing secretRef / externalSecret and
re-sync before the realm becomes usable.
Enabling itβ
Append to your prod values:
keycloak:
strictClientSecrets: true
# Must also set ALL FOUR client_secret sources or strict mode will fail.
uiClient:
secretRef: caipe-keycloak-ui-client # or .externalSecret
platformClient:
secretRef: caipe-keycloak-platform-client # or .externalSecret
tokenExchange:
secretRef: caipe-keycloak-bot # or .externalSecret
webexTokenExchange:
secretRef: caipe-keycloak-webex-bot # or .externalSecret
The default is false to keep the docker-compose.dev.yaml flow and
the CI matrix runs (which intentionally use placeholders) working
unchanged. Turn it on only for production / customer-facing
clusters.
Verifying strict mode locallyβ
A second end-to-end integration test ships in the repo,
tests/integration/test_keycloak_strict_client_secrets.sh. It boots a
throwaway Keycloak, sanity-checks that all four placeholders are
initially accepted, then exercises both the failure path (strict mode
without rotation env vars) and the success path (rotation + strict mode
passes). Final step asserts all four placeholders are rejected
after the full reconcile.
make test-keycloak-strict-secrets
# or, with a custom port:
KC_PORT=29080 ./tests/integration/test_keycloak_strict_client_secrets.sh
Runs in ~16 seconds on a warm cache. Wired into CI via the same
.github/workflows/ci-keycloak-init.yml workflow as the platform
reconcile test, and blocks the release-notification path.
Recommended adoption orderβ
- Adopt
secretRef/externalSecretfor one client at a time, withstrictClientSecrets: false. - Verify each rotation works (see the per-client "Verifying the reconcile path locally" sections above).
- Once all four are wired, set
strictClientSecrets: truein the same PR that bumps your chart values. The Helm upgrade will fail immediately if any of them is still missing β which is the entire point.
SSO bootstrap β kc_idp_hint and the IdP redirectorβ
CAIPE's SSO experience is built on Keycloak's identity provider redirector: instead of presenting Keycloak's own username/password page, the realm bounces the browser straight to the upstream IdP (Okta / Duo SSO / Azure AD / Cisco Customer Identity / β¦). Two pieces of machinery together produce that behaviour, and they each have a dedicated test that locks the contract in CI.
How the redirect path is wiredβ
init-idp.shruns theSetting '<alias>' as default IdP redirectorblock. That looks up the realm'sidentity-provider-redirectorexecution in the browser flow and attaches an authentication-config entry whoseconfig.defaultProvider == ${IDP_ALIAS}. The result: anyone hitting/realms/caipe/protocol/openid-connect/authis funnelled to/realms/caipe/broker/${IDP_ALIAS}/login.- When
KEYCLOAK_FORCE_IDP_REDIRECT=true, the same script then runsEnforcing '<alias>' as the only browser login path, which flips the redirector execution requirement fromALTERNATIVEtoREQUIREDand drops the local forms execution. The browser can no longer fall back to a username/password prompt β every login must round-trip through the upstream IdP. - The UI's NextAuth provider in
ui/src/lib/auth-config.tsinjectskc_idp_hint=${OIDC_IDP_HINT}into the authorization params only whenOIDC_IDP_HINTis set. That lets the BFF override Keycloak's default redirector per-deployment (e.g. force a specific alias in prod, leave it unset in a dev box). The conditional spread is important β passingkc_idp_hint=""confuses some Keycloak builds.
Verifying the kc_idp_hint plumbing locallyβ
Two complementary tests cover the path:
-
UI unit test (
ui/src/lib/__tests__/auth-config.test.ts, describeOIDC kc_idp_hint forwarding) pins the conditional spread: hint forwarded whenOIDC_IDP_HINTis set, omitted when it is unset or empty.cd ui && npx jest --testPathPatterns auth-config.test.ts -t "kc_idp_hint" -
End-to-end integration test (
tests/integration/test_keycloak_idp_hint_redirect.sh) boots a throwaway Keycloak, runsinit-idp.shagainst it withKEYCLOAK_FORCE_IDP_REDIRECT=true, and asserts:- The redirector requirement is
REQUIREDand its default provider is the configured alias. GET /realms/caipe/protocol/openid-connect/auth(no hint) returns302/303to/broker/${IDP_ALIAS}/login.GET /realms/caipe/protocol/openid-connect/auth?kc_idp_hint=${IDP_ALIAS}produces the same broker redirect.GET β¦?kc_idp_hint=does-not-existdoes not 5xx β Keycloak gracefully falls back rather than crashing the realm.- A second init-idp.sh run with
KEYCLOAK_FORCE_IDP_REDIRECT=falseleaves the default provider wired (regression guard).
make test-keycloak-idp-hint
# or with a custom port:
KC_PORT=21080 ./tests/integration/test_keycloak_idp_hint_redirect.sh
# or run all Keycloak SSO bootstrap tests together:
make test-keycloak-sso-allRuns in ~18 seconds on a warm Docker cache. Wired into CI via the
idp-hint-testjob in.github/workflows/ci-keycloak-init.yml, and blocks the release-notification path alongside the reconcile and strict-secrets tests. - The redirector requirement is
Known issue exposed by the testβ
The integration test currently observes that the bundled
init-idp.sh log line WARNING: failed to update IdP redirector config. actually does prevent the default provider from being wired
on a fresh realm in some POSIX-shell builds. The test compensates by
writing the authenticationConfig directly via the Admin API before
running the redirect assertions, so the Keycloak behaviour is still
validated end-to-end. The shell-side fix (switching the read -r
parse off whitespace-only IFS) is tracked separately.
Dynamic-agents Keycloak/OIDC env-var contractβ
The dynamic-agents service validates every incoming Bearer token against
Keycloak's JWKS endpoint
(ai_platform_engineering/dynamic_agents/src/dynamic_agents/auth/jwks_validate.py).
Two env vars drive that flow and are the most common source of "all
requests 401" in production-shaped deployments:
| Env var | Purpose | Default when unset |
|---|---|---|
KEYCLOAK_URL | In-cluster URL the pod uses for the server-to-server JWKS fetch. | http://localhost:7080 β connection-refused inside a pod |
OIDC_ISSUER | Public issuer string that MUST match the iss claim Keycloak puts in JWTs. | Derived from KEYCLOAK_URL β fine in dev (where they coincide), wrong in any deployment where the in-cluster service URL differs from the browser-facing issuer (Keycloak's KC_HOSTNAME) |
Both can be overridden by the alternates KEYCLOAK_ISSUER (legacy
OIDC_ISSUER alias) and KEYCLOAK_JWKS_URL / OIDC_JWKS_URL (full
override of the derived JWKS URI). Resolution order is pinned by
ai_platform_engineering/dynamic_agents/tests/test_jwks_validate.py.
Failure mode this preventsβ
The validator computes the expected issuer from OIDC_ISSUER if set,
otherwise from KEYCLOAK_URL. When OIDC_ISSUER is unset and
KEYCLOAK_URL is the in-cluster service (e.g.
http://ai-platform-engineering-keycloak:8080), the validator expects
issuer http://ai-platform-engineering-keycloak:8080/realms/caipe, but
Keycloak bakes whatever KC_HOSTNAME was configured with into the JWT's
iss claim (e.g. https://idp.example.com/realms/caipe). PyJWT then
rejects the token with InvalidIssuerError, and every authenticated
request lands in the 401 path. Pod logs look like JWKS fetch is healthy
(it is β that's KEYCLOAK_URL's job) but tokens still fail validation.
Chart wiringβ
The dynamic-agents subchart exposes both keys under config::
dynamic-agents:
config:
KEYCLOAK_URL: "http://ai-platform-engineering-keycloak:8080"
OIDC_ISSUER: "https://idp.example.com/realms/caipe"
The umbrella charts/ai-platform-engineering/values.yaml defaults
KEYCLOAK_URL to the bundled Keycloak service so a vanilla install
works out of the box; OIDC_ISSUER is intentionally left empty because
it depends on the deployment's public hostname.
The subchart's templates/configmap.yaml skips keys whose value is the
empty string β that way leaving OIDC_ISSUER: "" in values.yaml does
NOT clobber the in-code default with an empty env var (which would be a
sharper failure mode than just unsetting it). To opt out of the
in-code default explicitly, set the value to a single space.
templates/NOTES.txt prints a non-blocking warning at helm install /
helm upgrade time when either is empty, with the exact YAML snippet
to drop into the operator's overrides. Three warning branches exist
(both empty, only OIDC_ISSUER empty, only KEYCLOAK_URL empty) so
the message is always actionable.
Test coverageβ
| Layer | Test | What it pins |
|---|---|---|
| Vendored validator (the actual runtime path) | ai_platform_engineering/dynamic_agents/tests/test_jwks_validate.py | _kc_base_url(), _kc_issuer(), _kc_jwks_uri() resolution order + end-to-end validation: token with public issuer + OIDC_ISSUER set β valid; same token without OIDC_ISSUER β InvalidIssuerError; explicit KEYCLOAK_JWKS_URL / OIDC_JWKS_URL overrides win |
| Helm chart | tests/test_dynamic_agents_chart_keycloak_env.py | Both keys propagate into the rendered ConfigMap when set; empty defaults are omitted; NOTES.txt warns on every empty-value branch; umbrella chart defaults KEYCLOAK_URL to the bundled Keycloak service |
| Shared validator (used by supervisor + RAG, not dynamic-agents) | tests/rbac/unit/py/test_jwks_validate.py | Same contract pinned for ai_platform_engineering.utils.auth.jwks_validate |
The vendored and shared copies MUST stay in sync β if you change one, update the other and run both test suites.
R1: BFF Keycloak Admin token β production-safety gateβ
Problemβ
ui/src/lib/rbac/keycloak-admin.ts::fetchFreshAdminToken is the BFF's
sole code path for getting an Admin REST token. Historically, if
KEYCLOAK_ADMIN_CLIENT_ID / KEYCLOAK_ADMIN_CLIENT_SECRET were unset
(which is the default everywhere β the chart never sets them, see the
"current gap" rows in file-map.md), the BFF silently fell back to a
password grant against /realms/master with the literal credentials
username=admin, password=admin.
In a cluster where the Keycloak bootstrap admin password is still the default, that's master-realm admin escalation from the BFF. In Kevin's install it just 401'd (because the bootstrap admin had been rotated), but the code path is still wrong: the BFF was attempting an escalation it shouldn't have been allowed to attempt in the first place.
Failure modeβ
| Operator state | What happened before R1 | What happens after R1 |
|---|---|---|
client_credentials env unset, bootstrap admin = default password (admin/admin) | Silent success against /realms/master β BFF gets a master-realm admin token | Throws at the first admin call: Keycloak admin credentials missing: ... ALLOW_KEYCLOAK_ADMIN_PASSWORD_FALLBACK=true to opt in for local dev only |
client_credentials env unset, bootstrap admin rotated | 401 from /realms/master (Kevin's symptom) | Same throw as above; no /realms/master call attempted at all |
client_credentials env set but pointing at stale caipe-platform-dev-secret, master rotated | 401 from realm β fallback β 401 from /realms/master (Keycloak token (password (admin-cli)) failed: 401 invalid_grant) | Realm 401 propagates verbatim; no master fallback |
Gate semanticsβ
The gate lives in adminPasswordFallbackAllowed() and follows this
precedence:
ALLOW_KEYCLOAK_ADMIN_PASSWORD_FALLBACK | NODE_ENV | Fallback allowed? |
|---|---|---|
"true" or "1" | any | yes (explicit opt-in) |
"false" or "0" | any | no (explicit opt-out) |
| empty / unset | production | no (the safe default in the chart) |
| empty / unset | anything else (development, test, β¦) | yes (docker-compose dev default) |
Chart wiringβ
The caipe-ui subchart exposes both keys under config::
caipe-ui:
config:
NODE_ENV: "production" # chart default; do NOT change for prod installs
ALLOW_KEYCLOAK_ADMIN_PASSWORD_FALLBACK: "" # default: gate closed in prod
For a throwaway dev cluster where the operator has accepted the risk (e.g. the Keycloak bootstrap admin password is known not to be the default), the flag can be flipped on:
caipe-ui:
config:
ALLOW_KEYCLOAK_ADMIN_PASSWORD_FALLBACK: "true"
docker-compose.dev.yaml defaults the flag to true (and runs with
NODE_ENV=development anyway) so the local stack continues to boot a
fresh Keycloak without requiring the operator to first wire a service
account.
How to do this properly in productionβ
Don't rely on the fallback β give the BFF a real service-account client.
As of the R1 upstream fix (May 2026, see below), the chart now does
this auto-wiring out of the box β no manual existingSecret plumbing
required. Operators only need to provide the caipe-platform-secret
itself (via K8s Secret or ESO).
R1 upstream fix (May 2026) β chart auto-wires caipe-platform-secretβ
Before this fix, even operators who did everything right (rotated the
caipe-platform client secret in Keycloak via
keycloak.platformClient.secretRef) still had to manually plumb the
same secret into the caipe-ui pod β typically via an
extraDeploy: mirror Secret or a caipe-ui.existingSecret override
that surfaced the value under the wrong env-var name (OIDC_CLIENT_SECRET
instead of KEYCLOAK_ADMIN_CLIENT_SECRET). This was the half-fix Sri
documented at 3:41 PM in the Slack thread and the upstream gap Kevin
flagged at 4:09 PM.
The chart now does this automatically:
- The umbrella
values.yamldefaultskeycloak.platformClient.secretRef = "caipe-platform-secret"ANDcaipe-ui.keycloakAdminClient.secretName = "caipe-platform-secret". - The caipe-ui Deployment template gained a
keycloakAdminClientblock that, whensecretNameis non-empty, injectsKEYCLOAK_ADMIN_CLIENT_SECRETviavalueFrom.secretKeyRef(keyOIDC_CLIENT_SECRET) andKEYCLOAK_ADMIN_CLIENT_IDvia the ConfigMap. - The caipe-ui ConfigMap template skips its
KEYCLOAK_ADMIN_CLIENT_IDauto-injection if the operator has already set it undercaipe-ui.config.KEYCLOAK_ADMIN_CLIENT_ID(no-clobber).
The on-the-wire result of a helm install with just
keycloak.platformClient.secretRef=caipe-platform-secret (or even the
defaults alone) is now:
# caipe-ui Deployment env (rendered)
env:
- name: KEYCLOAK_ADMIN_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: caipe-platform-secret
key: OIDC_CLIENT_SECRET
# caipe-ui ConfigMap data (rendered)
KEYCLOAK_ADMIN_CLIENT_ID: caipe-platform
The BFF's fetchFreshAdminToken reads both vars and uses
client_credentials against the realm β never /realms/master, never
admin/admin.
Migration path for existing usersβ
| Persona | Old behaviour | New behaviour (after upgrade) | Action required |
|---|---|---|---|
| Path 1 (DEV) β local docker-compose, no in-cluster Keycloak | BFF fell back to admin/admin against the bundled Keycloak; NODE_ENV=development kept the gate open | Unchanged at runtime (the gate is still open when NODE_ENV!=production). Chart render now also injects KEYCLOAK_ADMIN_CLIENT_SECRET pointing at caipe-platform-secret, but you won't notice unless you helm template | None. Existing dev installs keep working. |
Path 2 (PROD K8s Secrets) β operator pre-created caipe-platform-secret and set keycloak.platformClient.secretRef=caipe-platform-secret | Realm reconciled correctly; BFF still missing KEYCLOAK_ADMIN_CLIENT_SECRET; fell back to admin/admin (which 401'd post-bootstrap-admin-rotation β Kevin's symptom) | BFF picks up the same Secret automatically; admin REST works on first call. Any manual caipe-ui.existingSecret=caipe-platform-secret or extraDeploy: patch you added as a workaround keeps working alongside the auto-wiring (Pin 2 covers this β no env-var collision because they're projected under different names) | None. Optional cleanup: remove the existingSecret workaround or the extraDeploy: mirror; both became redundant. |
Path 2 (PROD K8s Secrets) with a custom secret name β operator pre-created my-custom-platform-secret and set keycloak.platformClient.secretRef=my-custom-platform-secret | Same as above (BFF missing creds) | The auto-wiring still defaults to caipe-platform-secret (the conventional name), so the BFF would point at a non-existent Secret and the pod fails to start with CreateContainerConfigError | Required: also set caipe-ui.keycloakAdminClient.secretName: my-custom-platform-secret. Helm cannot substitute the value across subchart boundaries β both halves must be set explicitly. |
Path 3 (PROD ESO) β operator set keycloak.platformClient.externalSecret.enabled=true with no secretRef override | ESO emitted a Secret named <release>-keycloak-platform-client; BFF had no wiring to it; fell back to admin/admin | ESO now emits the Secret as caipe-platform-secret (the umbrella default for platformClient.secretRef propagates into the helper that resolves the ESO target name). The BFF picks it up automatically | One-time migration: see "ESO target rename" below. Existing rotation pipelines continue to write to the upstream secret store unchanged; only the in-cluster Secret name changes. |
Path 3 (PROD ESO) with a custom secret name β operator set both externalSecret.enabled=true AND keycloak.platformClient.secretRef=my-custom-platform-secret | Same as above | The ESO emits my-custom-platform-secret; the BFF defaults to caipe-platform-secret and would fail to start | Required: also set caipe-ui.keycloakAdminClient.secretName: my-custom-platform-secret, same as the PROD K8s Secrets case. |
Path 3 β ESO target rename, one-time migrationβ
If you previously ran on Path 3 with the default secretRef="", the
in-cluster Secret name changes from <release>-keycloak-platform-client
to caipe-platform-secret on the next helm upgrade. ESO will:
- Create a new Secret named
caipe-platform-secretwith the same payload (sameOIDC_CLIENT_SECRETvalue from your secrets backend). - Leave the old
<release>-keycloak-platform-clientSecret orphaned until youkubectl delete secret/<release>-keycloak-platform-client(or until the release is reinstalled).
There is no service downtime because:
- The keycloak init Jobs read from
keycloak.platformClientSecretNamewhich now resolves tocaipe-platform-secretand write the same value back into Keycloak via the Admin API. - The caipe-ui Deployment reads from
caipe-platform-secretvia the new env wiring. - The old
<release>-keycloak-platform-clientSecret has no consumers after the upgrade.
If your rotation pipeline or out-of-band tooling directly references
the old Secret name (e.g. kubectl get secret <release>-keycloak-platform-client),
update it to caipe-platform-secret. To preserve the old name verbatim,
pin it explicitly:
keycloak:
platformClient:
secretRef: "" # explicit empty β revert to old helper behaviour
caipe-ui:
keycloakAdminClient:
secretName: "" # disable the BFF auto-wiring (also re-introduces the
# R1 admin/admin fallback gap in production β
# NOT recommended; see "Gate semantics" above)
The recommended path is to let the rename happen and update tooling.
Standalone caipe-ui subchart usersβ
If you install the caipe-ui subchart directly (not via the umbrella β
e.g. running caipe-ui against an external Keycloak with its own
credentials in existingSecret), the auto-wiring is off by default
in the standalone chart (caipe-ui.values.yaml ships
keycloakAdminClient.secretName: ""). No migration needed; behaviour
is unchanged.
Test coverageβ
| Layer | Test | What it pins |
|---|---|---|
| BFF unit (legacy fallback chain) | ui/src/lib/rbac/__tests__/keycloak-admin-token.test.ts (fallback chain describe) | 4 branches: admin client env unset β /realms/master succeeds; first call 401 β fallback succeeds; both 401 β exact Kevin error string; happy path β no /realms/master call |
| BFF unit (R1 gate) | ui/src/lib/rbac/__tests__/keycloak-admin-token.test.ts (production safety gate (R1) describe) | 4 branches: prod-default-deny with no /realms/master call; first-call-401 re-raises verbatim in prod; explicit =true re-enables fallback even in prod; explicit =false disables fallback even in dev |
| Helm chart (R1 BFF gate) | tests/test_caipe_ui_keycloak_admin_client_env.py (Pin 5, Pin 6) | Default chart render has NODE_ENV=production AND never defaults ALLOW_KEYCLOAK_ADMIN_PASSWORD_FALLBACK=true; operator-supplied override propagates to the ConfigMap |
| Helm chart (R1 upstream fix) | tests/test_caipe_ui_keycloak_admin_client_env.py (Pin 1β4, 7β9) | Default install auto-wires KEYCLOAK_ADMIN_CLIENT_ID/SECRET; explicit platformClient.secretRef and ESO paths render identically; operator override of keycloakAdminClient.secretName is honoured; explicit secretName="" skips the wiring cleanly (standalone caipe-ui); explicit config.KEYCLOAK_ADMIN_CLIENT_ID wins over auto-injection (no-clobber) |
If a regression silently re-enables the admin/admin fallback, the
production-gate describe will trip; if the chart ever defaults the flag
to true, Pin 5 will trip; if the R1 upstream auto-wiring breaks,
Pin 1 (default install) will trip. Together they pin the contract
end-to-end.
R4: NEXTAUTH_SECRET β strict modeβ
Problemβ
NEXTAUTH_SECRET HS256-signs two distinct credential surfaces:
- NextAuth session cookies β used by every authenticated browser request to the BFF.
- Internal skills-API tokens β minted by
ui/src/lib/jwt-validation.ts::signLocalSkillsToken, used by the Skills page and any external automation that exchanges a long-lived key for a Bearer token. (The token'siat,exp, andscopeare all signed under the same key.)
If two operators ship the same value (very easy to do β caipe-dev-secret
was hardcoded in Makefile:199 and caipe-dev-secret-change-in-production
shipped as the default in docker-compose.yaml), then a session cookie
or skills-API token forged on install A is byte-for-byte valid on install B.
That's a one-line cross-install identity compromise.
Strict-mode gateβ
The gate lives in ui/src/lib/nextauth-secret-guard.ts and is
consumed by jwt-validation.ts (both signLocalSkillsToken and
validateLocalSkillsJWT). It rejects:
| Input | Strict mode (prod) | Dev mode |
|---|---|---|
| Unset / empty | throws | throws |
Known placeholder (caipe-dev-secret, changeme, your-secret-here, β¦) | throws | warns + accepts |
| Shorter than 32 characters | throws | accepts |
| β₯32 characters, not in the placeholder set | accepts | accepts |
The full placeholder set is the constant
KNOWN_NEXTAUTH_PLACEHOLDERS in nextauth-secret-guard.ts. Adding to
it is a one-way ratchet (we never remove entries); if a postmortem
surfaces a new leaked value, add it there.
Gate semanticsβ
Strict mode follows the same ALLOW_* / NODE_ENV precedence as the
R1 gate:
ALLOW_NEXTAUTH_DEV_SECRET | NODE_ENV | Strict mode? |
|---|---|---|
"true" or "1" | any | off (explicit opt-out) |
"false" or "0" | any | on (explicit opt-in) |
| empty / unset | production | on |
| empty / unset | anything else | off |
In strict mode, signLocalSkillsToken throws at mint time and
validateLocalSkillsJWT returns null (so the caller falls through
to OIDC instead of 5xx-ing the request).
Chart wiringβ
charts/.../caipe-ui/values.yaml ships NODE_ENV: "production" by
default, so the gate is implicitly on for every Helm install. The
chart does NOT provide a default NEXTAUTH_SECRET value (existingSecret
or externalSecrets is required) β see the comment block at the
existingSecret: key for the supported wiring patterns.
Install-path coverageβ
| Surface | Before R4 | After R4 |
|---|---|---|
Makefile :: run-caipe-ui-docker | -e NEXTAUTH_SECRET=caipe-dev-secret (literal!) | Refuses to start unless $$NEXTAUTH_SECRET is set AND not a placeholder |
docker-compose.yaml | ${NEXTAUTH_SECRET:-caipe-dev-secret-change-in-production} | ${NEXTAUTH_SECRET:?...must be set...} aborts compose-up |
docker-compose.dev.yaml | (already default-falls-back to a placeholder) | Unchanged; BFF gate accepts in dev |
ui/env.example | NEXTAUTH_SECRET=your-secret-here | Updated to call out R4 and link the secrets doc |
docs/docs/ui/auth-flow.md | NEXTAUTH_SECRET=your-secret-here | Updated with explicit openssl rand -base64 48 cue |
Test coverageβ
| Layer | Test | What it pins |
|---|---|---|
| Guard unit tests | ui/src/lib/__tests__/nextauth-secret-guard.test.ts | 24 tests covering isStrictSecretMode precedence (NODE_ENV + override flag, both directions, "1"/"0" aliases), strict-mode rejection of every placeholder (including the exact caipe-dev-secret and caipe-dev-secret-change-in-production strings), trim-before-compare, length floor, dev-mode warn-but-accept, and SKILLS_API_SECRET precedence in getSafeNextAuthSecret |
| Downstream consumer | ui/src/lib/__tests__/jwt-validation.test.ts | All existing 7 tests still pass with the guard wired in (no behavioral regression in dev) |
| Operator surface | Compose docker compose up against docker-compose.yaml without NEXTAUTH_SECRET set | Aborts with the embedded error message (test by hand: unset NEXTAUTH_SECRET && docker compose up) |
If a regression weakens the strict-mode floor, the guard unit tests trip; if the Makefile target ever re-introduces the literal placeholder, the embedded shell-level placeholder check refuses to run.
How to do this properlyβ
# values-prod.yaml β production install
caipe-ui:
existingSecret: caipe-ui-runtime-secrets
externalSecrets:
enabled: false
Then create the Secret with a real value:
NEXTAUTH_SECRET="$(openssl rand -base64 48)"
kubectl create secret generic caipe-ui-runtime-secrets \
--from-literal=NEXTAUTH_SECRET="$NEXTAUTH_SECRET" \
--from-literal=OIDC_CLIENT_SECRET="β¦" \
--from-literal=MONGODB_URI="β¦"
For ESO/Vault, point caipe-ui.externalSecrets.data[*] at your
NEXTAUTH_SECRET key in the secret store β see
charts/.../caipe-ui/values-external-secrets.yaml for the template.
R3: MongoDB rootPassword strict modeβ
Background. The caipe-ui-mongodb subchart shipped
auth.rootPassword: "changeme" as its chart default. The default got
materialised directly into the in-cluster Secret by
templates/secret.yaml, so any operator who ran helm install without
either overriding auth.rootPassword or enabling
externalSecrets.enabled=true would ship admin/changeme to MongoDB.
The same changeme then propagated into every MONGODB_URI consumer
(dynamic-agents, supervisor, caipe-ui session store).
This is the same class of issue we fixed for Keycloak caipe-*-dev-secret
under strictClientSecrets. R3 introduces the parallel
strictPasswords flag on the MongoDB subchart.
How it worksβ
When caipe-ui-mongodb.strictPasswords: true AND
caipe-ui-mongodb.externalSecrets.enabled: false, the template helper
mongodb.assertStrictPasswords calls {{ fail }} if
auth.rootPassword is in the known-placeholder set:
"changeme", "change-me", "please-change-me",
"admin", "password", "password123",
"mongo", "mongodb", "root",
"test", "dev", "development",
"secret", "your-password-here", "replace-me"
The helper also enforces an 8-character minimum so very-short
dev-leftover values like mongo get rejected even when they aren't in
the placeholder list verbatim. The check is case-insensitive
(ChangeMe, CHANGEME both fail).
When externalSecrets.enabled: true, the in-cluster Secret is built
from the external store (Vault, AWS Secrets Manager, GCP Secret
Manager, β¦) by ESO β the chart's auth.rootPassword value never lands
in etcd, so the strict-mode gate skips its check unconditionally. An
operator who points ESO at a secret that itself contains "changeme"
is solving a different problem than this gate addresses.
Migrationβ
Production GitOps installs:
# values.yaml
caipe-ui-mongodb:
strictPasswords: true
auth:
rootPassword: "" # operator MUST set a real value or use ESO
# OR (preferred for prod):
externalSecrets:
enabled: true
data:
- secretKey: MONGO_INITDB_ROOT_PASSWORD
remoteRef:
key: prod/caipe/mongodb
property: password
Generate a CSPRNG password with:
openssl rand -base64 24
Default ships strictPasswords: false so the docker-compose dev flow
and CI matrix runs (which intentionally use the placeholder) keep
working unchanged.
Testsβ
tests/test_mongodb_strict_passwords.pyβ 14 chart-render pins covering: default-off (placeholder allowed), per-placeholder rejection, case-insensitive rejection, length floor, real-password acceptance, ESO bypass, and back-compat short-password allowance with strict off.tests/integration/test_mongodb_strict_passwords.shβ 4-stephelm templatewalk: strict-on + placeholder fails with the docs link in stderr; strict-on + real password renders the Secret; strict-on + ESO bypasses the gate; strict-off (default) preserves the docker-compose flow.
R2: setup-caipe.sh MongoDB random passwordβ
Background. The setup-caipe.sh workshop on-ramp installed the
bitnami/mongodb chart with auth.rootPassword=changeme and baked
mongodb://admin:changeme@caipe-mongodb:27017/caipe?authSource=caipe
into four cluster-internal config destinations:
| Site | Destination |
|---|---|
helm upgrade ... bitnami/mongodb | auth.rootPassword=changeme, auth.passwords[0]=changeme |
caipe-dynamic-agents-config ConfigMap | MONGODB_URI |
caipe-supervisor-agent-env ConfigMap | MONGODB_URI |
caipe-single-node-agent-env ConfigMap | MONGODB_URI |
caipe-ui-secret Secret | MONGODB_URI |
| Seed values file (dynamic-agents Helm seed) | dynamic-agents.config.MONGODB_URI |
Every operator who ran the workshop on-ramp inherited the same admin password and the same connection string.
How the fix worksβ
A new helper, _resolve_mongodb_password, runs before the
helm install and mirrors the existing LANGFUSE_PASSWORD pattern
that was added earlier:
- Read existing. Try to read
caipe-mongodb-credentials.MONGODB_ROOT_PASSWORD(base64-decoded) from thecaipenamespace. If present, reuse it β re-runs of the script stay idempotent and the running MongoDB pod keeps working. - Generate. Otherwise generate a fresh password with
openssl rand -hex 24β 48 hex chars (24 bytes of entropy). Hex is chosen deliberately so the value is URL-safe inside themongodb://admin:<pw>@...connection string (no@,/,:,?that would need percent-encoding). - Persist. Write the password into the
caipe-mongodb-credentialsSecret via the standardkubectl create β¦ --dry-run=client -o yaml | kubectl apply -f -idempotent-upsert pattern.
The variable $MONGODB_ROOT_PASSWORD is then read by every other
site in the script that previously hardcoded changeme. The "Services
Ready" banner prints a one-liner for recovering the password later:
kubectl get secret caipe-mongodb-credentials -n caipe -o jsonpath='{.data}' \
| python3 -c "import sys,json,base64; d=json.load(sys.stdin); \
print('\n'.join(f'{k}: {base64.b64decode(v).decode()}' for k,v in sorted(d.items())))"
Testsβ
tests/integration/test_setup_caipe_mongodb_password.sh runs the
helper in isolation by extracting its definition from setup-caipe.sh
and mocking kubectl on PATH. The 5 pins cover:
- First run with no existing Secret β fresh password generated and persisted.
- Second run with existing Secret β same password reused (idempotent).
- Password is exactly 48 hex chars, URL-safe.
- Password is never the literal
"changeme"(R2 regression guard). - Grep-based check:
changemeno longer appears in any non-comment line ofsetup-caipe.sh.
Bypass / debugβ
There is no bypass β every install gets a randomised password. If an operator deliberately wants a known password (e.g. for debugging a backup-restore flow), the simplest path is:
kubectl -n caipe create secret generic caipe-mongodb-credentials \
--from-literal=MONGODB_ROOT_USERNAME=admin \
--from-literal=MONGODB_ROOT_PASSWORD='your-debug-password' \
--from-literal=MONGODB_DATABASE=caipe
./setup-caipe.sh ...
The helper will read the existing Secret and reuse the chosen value.
What is NOT covered hereβ
- Realm-import secrets baked into the realm JSON β these are
consumed before the init Jobs run, so they live in a ConfigMap. If
you need to inject sensitive values into the realm itself (e.g.
pre-seeded service-account credentials), use a Vault Sidecar +
kubectl create configmap --from-file=β¦workflow instead. - OIDC IdP-side configuration (creating the OIDC client app in
Okta/Duo/etc.) β that's owned by your IdP admin. CAIPE only consumes
the resulting
client_id+client_secret.