Skip to main content

Run with IDPBuilder 🏗️ 💻

IDPBuilder is a tool for creating local Internal Developer Platform environments using KIND clusters. It provides a fast way to deploy and test platform components including ArgoCD, Vault, Backstage, and AI Platform Engineering agents.

Architecture Overview

IDPBuilder creates a KIND cluster and deploys platform components via ArgoCD. The CAIPE (Community AI Platform Engineering) stack adds authentication, secret management, and multi-agent AI capabilities:

Component Flow

  1. IDPBuilder Initialization: Creates KIND cluster and deploys ArgoCD + Gitea as foundation
  2. Platform Deployment: ArgoCD deploys all platform components from Git repositories
  3. Authentication Setup: Keycloak provides SSO for Backstage and other platform services
  4. Secret Management: Vault stores secrets, External Secrets distributes to applications
  5. Developer Access: NGINX Ingress routes traffic, Backstage provides developer portal
  6. AI Integration: Agent-Forge plugin in Backstage connects to CAIPE MAS Agent
  7. Multi-Agent System: Orchestrator manages individual agents for different platform domains

Option 1: Automated Setup from a fresh Ubuntu VM or Ubuntu EC2 instance

Note: These instructions are based on a Ubuntu/Debian VM

For a complete development environment with i3 desktop and VNC access on a vanilla Ubuntu VM or Ubuntu EC2 instance:

System Requirements:

  • 8 CPUs
  • 16 GB RAM
  • 80 GB Disk Space
  • Docker installed and ready
IMPORTANT WARNING - READ BEFORE RUNNING 🚨

This script makes SIGNIFICANT system changes and is ONLY intended for fresh Ubuntu VMs/EC2 instances. DO NOT run on your personal machine or production systems!

What This Script Installs:

  • Development Tools: Docker, Kubernetes (kubectl, Kind, K9s), HashiCorp Vault, GitHub CLI, IDPBuilder
  • Desktop Environment: i3 window manager, TigerVNC server, Firefox, Terminator
  • System Utilities: git, vim, jq, curl, wget, and more

⚠️ Important Warnings:

This script will:

  • Remove GNOME desktop if installed
  • Modify system configuration files
  • Add APT repositories and Docker group permissions

✅ Safe for: Fresh Ubuntu VMs/EC2 instances and dedicated dev environments

❌ DO NOT use on: Personal machines, production servers, or systems where you need GNOME

curl -sSL https://raw.githubusercontent.com/sriaradhyula/stacks/refs/heads/main/caipe/scripts/setup-ubuntu-prerequisites.sh -o /tmp/setup-ubuntu-prerequisites.sh && chmod +x /tmp/setup-ubuntu-prerequisites.sh && /tmp/setup-ubuntu-prerequisites.sh

Option 2: Manual Setup

For detailed manual installation and configuration steps, see Manual Setup Guide.

Option 3: Existing Machine Setup

If you're running on your existing machine, ensure you have the following prerequisites installed as well as ensureing it does meet the above system requirements.

Required:

Step 2: Clone CAIPE Stack

Clone the CAIPE stack GitHub repository which contains the CAIPE platform and all the components to deploy the IDPBuilder cluster:

git clone https://github.com/sriaradhyula/stacks

CAIPE Runtime/Deployment Profiles

CAIPE (Community AI Platform Engineering) offers multiple runtime/deployment profiles to suit different use cases:

Available Profiles

ProfileDescriptionUse Case
caipe-basic-p2pBasic peer-to-peer AI platform with essential componentsQuick testing, minimal setup, best for learning
caipe-complete-p2p (recommended)Full-featured platform with all AI agents and integrationsProduction-like with complete with CAIPE + A2A Peer-to-Peer betwen agents

Step 3: Pick your runtime/deployment CAIPE profile

IDPBuilder supports different profiles as listed above:

# From the dicrectory that the stacks repository is cloned to
# Create cluster with CAIPE complete-p2p profile
idpbuilder create \
--use-path-routing \
--package stacks/ref-implementation \
--package stacks/caipe/base \
--package stacks/caipe/complete

Option 2: CAIPE Basic A2A P2P Profile (for learning and quick start)

# Create cluster with CAIPE basic-p2p profile
idpbuilder create \
--use-path-routing \
--package stacks/ref-implementation \
--package stacks/caipe/base

Run your preferred runtime option. This process will:

  • Create a KIND cluster
  • Install core platform components
  • Deploy ArgoCD, Vault, and Backstage
  • Configure ingress with path-based routing
note

This takes around 5-10 minutes. Feel free to grab a coffee while it's deploying

note

Before proceeding to configure secrets, you may want to access ArgoCD to monitor the deployment progress of platform components.

Access ArgoCD and Monitor Deployments

Once the cluster is created, IDPBuilder outputs the ArgoCD URL.

Get ArgoCD Credentials

# Get admin password
idpbuilder get secrets -p argocd

Access ArgoCD

Open https://cnoe.localtest.me:8443/argocd/ and login with:

  • Username: admin
  • Password: From the command above

Monitor application sync status for vault, ai-platform-engineering. Initial synchronization takes 3-5 minutes as the cluster will be pulling all the required images.

Verify Vault Application Sync Status

Before proceeding to the next step, verify that the Vault application has synced successfully on ArgoCD like below:

note

If you can't see Vault application in the ArgoCD UI, go to the 2nd page and check if Vault application is listed.

ArgoCD Vault Sync

Step 4: Configure Vault Secrets

After Vault application syncs successfully on ArgoCD, configure your LLM provider credentials:

note

If you prefer to configure secrets manually through the Vault UI, see Manual Vault Secret Setup.

By default, ArgoCD and Backstage secrets are already populated in Vault using the local deployment credentials (if the sub-agents are running in your cluster). However, you can override these secrets if you want to run the sub-agents on your remote ArgoCD and Backstage deployments.

The below script will do the following:

  • Prompt you to select your LLM provider. Collect the required credentials securely and store them in cluster local Vault automatically
  • Detect all sub-agents running in your cluster and populate thier secrets in Vault automatically
  • Refresh the Kubernetes secrets and restart the deployments to ensure all agents pick up the new credentials

If you already have LLM credentials and other sub-agent secrets in .env file, you can run the setup-all.sh script to configure them automatically:

info

The below script will automatically detect all sub-agents running in your cluster and populate only those secrets in Vault. It will not overwrite any existing secrets in Vault by default (ArgoCD and Backstage). If you want to overwrite these secrets, you can add the --override-all flag to the command.

note

If any secret is missing in the .env file, the script will prompt you to enter it interactively.

stacks/caipe/scripts/setup-all.sh --envFile <path-to-your-env-file>

If you don't have a .env file, you can run the script without the --envFile option and it will prompt you to enter your credentials interactively:

stacks/caipe/scripts/setup-all.sh

Once the script is complete, you will see the following message:

✅ Cleanup process completed successfully!
🎯 All secrets and pods have been refreshed in the ai-platform-engineering namespace
✅ All done.

You can also verify the secrets are correctly configured in Vault by checking the Vault UI at https://vault.cnoe.localtest.me:8443/ui/vault/secrets/secret/kv/list/ai-platform-engineering/

When asked for the token to access Vault, use the root token from the command below:

# Extract root token
kubectl get secret vault-root-token -n vault -o jsonpath="{.data}" | \
jq -r 'to_entries[] | "\(.key): \(.value | @base64d)"'

Step 5: Access RAG WebUI Portal (Optional)

The RAG (Retrieval-Augmented Generation) WebUI provides a powerful interface for ingesting, querying, and managing knowledge bases that enhance your AI agents' capabilities.

Access RAG Ingestion WebUI

Open https://rag-webui.cnoe.localtest.me:8443/ to access the RAG WebUI Portal.

RAG Ingestion Interface

Use the ingestion interface to add documents, websites, and other knowledge sources to your RAG system:

RAG Ingestion Interface

RAG Querying Interface

Query your ingested knowledge bases to test retrieval and get insights from your documents:

RAG Querying Interface

Perform advanced searches with filters, similarity thresholds, and custom parameters:

RAG Advanced Search

The RAG WebUI allows you to:

  • Ingest Documents: Upload PDFs, text files, and web content
  • Query Knowledge: Search and retrieve relevant information
  • Test Retrieval: Validate your knowledge base responses
  • Monitor Performance: Track ingestion and query metrics
  • Configure Settings: Adjust embedding models and search parameters

Step 6: Access Backstage Portal

Get Backstage Credentials

# Get user1 password
idpbuilder get secrets | grep USER_PASSWORD | sed 's/.*USER_PASSWORD=\([^,]*\).*/\1/'

Login to Backstage

Open https://cnoe.localtest.me:8443/ and login with:

  • Username: user1
  • Password: From the command above

Interact with the AI Platform Engineer Assistant

Once logged into the Backstage portal, you can interact with the AI Platform Engineer assistant by clicking on the AI agent icon in the bottom-right corner.

Backstage AI Agent Icon

When you click on the AI agent icon, it will open up the AI Platform Engineer assistant UI where you will be able to interact directly with the agent.

Backstage AI Agent UI

Some example prompts to try out

What can you do?
What ArgoCD version are you using? # If running ArgoCD agent
Get my Github profile detatils # If running Github agent

Useful Addresses for IDPBuilder Cluster

Troubleshooting

Verify Kubernetes Cluster

If you need to verify the cluster status and configuration:

# Check cluster status
kubectl get nodes
# Verify all pods are running
kubectl get pods --all-namespaces
# Check ingress configuration
kubectl get ingress --all-namespaces

Manual Refresh Commands

If you prefer to run the commands manually:

# Delete all secrets
kubectl delete secret --all -n ai-platform-engineering
# Restart all deployments
kubectl rollout restart deployment -n ai-platform-engineering
# Wait for deployments to be ready
kubectl rollout status deployment -n ai-platform-engineering --timeout=300s
# Check final status
kubectl get pods -n ai-platform-engineering
kubectl get deployments -n ai-platform-engineering

Troubleshooting Not-Ready Pods

If pods are running but not ready (0/1 Ready), the script will automatically:

  • Show pod events to identify startup issues
  • Display container logs for troubleshooting
  • Wait additional time for readiness checks
  • Provide guidance for manual investigation

Common issues and solutions:

  • Readiness probe failures: Check application health endpoints
  • Missing secrets: Ensure secrets are properly configured in Vault
  • Application startup time: Some applications need more time to initialize
  • Dependency issues: Verify all required services are available

VNC Management Commands

# List VNC sessions
vncserver -list
# Kill VNC session
vncserver -kill :1
# Restart VNC with new resolution
vncserver :1 -geometry 2560x1400 -depth 24 -localhost yes
# Start VNC viewer with SSH tunnel
ssh -i ~/.ssh/private.pem -L 5903:localhost:5901 ubuntu@<YOUR UBUNTU IP> -f -N && vncviewer localhost:5903

Cleanup

# Destroy the cluster and all resources
kind delete cluster --name localdev
# Stop VNC server (if running)
vncserver -kill :1

Problem

Kubernetes pod ai-platform-engineering-supervisor-agent-d8cffb7fb-vg4w4 failing with:

to create fsnotify watcher: too many open files

Root Cause

Default system limit of 1024 file descriptors insufficient for applications creating multiple file watchers.

apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-platform-engineering-supervisor-agent
namespace: ai-platform-engineering
spec:
template:
spec:
securityContext:
sysctls:
- name: fs.inotify.max_user_instances
value: "8192"
- name: fs.inotify.max_user_watches
value: "524288"
containers:
- name: supervisor-agent
securityContext:
capabilities:
add:
- SYS_RESOURCE
resources:
limits:
memory: "512Mi"
cpu: "500m"

Solution 2: System-wide Fix

# Increase inotify limits
echo 'fs.inotify.max_user_watches=524288' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# Restart the pod
kubectl delete pod ai-platform-engineering-supervisor-agent-d8cffb7fb-vg4w4 -n ai-platform-engineering

Verification

# Check current limits
ulimit -n
cat /proc/sys/fs/inotify/max_user_watches

Apply the deployment changes and the pod will automatically restart with increased file descriptor limits.