Multi-Agent Systems and CAIPE
1. Overviewβ
This is the second part of the AI agents lab series. In this part, you'll learn about Multi-Agent Systems (MAS) and build a cloud-native, production-ready MAS using CAIPE (Community AI Platform Engineering)βthis time deploying to Kubernetes with Helm and Kind, featuring the weather agent and NetUtils agent.
What you'll learn in this part:
- Core concepts of Multi-Agent Systems (MAS)
- Common MAS architecture patterns
- The Agent-to-Agent (A2A) protocol
- How to deploy and interact with a multi-agent system
- How agents coordinate to solve complex, cross-domain problems
- Kubernetes-native deployment with Helm
Prerequisites:
- Completion of Part 1 (Introduction to AI Agents and ReAct Pattern)
- Basic understanding of AI agents and MCP
- A running CAIPE environment on Kubernetes (see below)
- Kind and kubectl installed locally
- Helm installed
Get your environment ready: Before starting this lab, you need CAIPE deployed on a Kubernetes cluster (e.g. Kind). The easiest way is to use the one-command setup script from the ai-platform-engineering repo root:
git clone https://github.com/cnoe-io/ai-platform-engineering.git
cd ai-platform-engineering
./setup-caipe.sh
The script will create a Kind cluster (if needed), deploy CAIPE (supervisor, agents, UI), and prompt you for LLM credentials. When it finishes, you can open the CAIPE UI and run the exercises in this lab. For full options (non-interactive mode, RAG, tracing), see Run CAIPE with KinD.
2. Understanding Multi-Agent Systemsβ
2.1 What is a Multi-Agent System?β
A Multi-Agent System (MAS) is an agentic AI system composed of multiple independent agents that interact and coordinate to achieve a common goal. Unlike single agents that handle all tasks themselves, MAS distributes work across specialized agents, each with specific expertise.
[!TIP] Think of MAS like a company: instead of one person doing everything, you have specialists (sales, engineering, support) working together to achieve business goals.
Key characteristics of MAS:
- Specialization: Each agent focuses on a specific domain or capability
- Autonomy: Agents operate independently with their own decision-making
- Coordination: Agents communicate and collaborate to solve complex problems
- Scalability: New agents can be added without redesigning the entire system
2.2 Common MAS Architecture Patternsβ
There are several proven patterns for organizing multi-agent systems. Let's explore the most common ones:
Network/Swarm Architectureβ
In this pattern, agents communicate in a network using pub-sub, multicast, or broadcast groups. Each agent is aware of and can hand off tasks to any other agent in the group.
Use cases:
- Distributed problem-solving where any agent can contribute
- Systems requiring high redundancy and fault tolerance
- Scenarios where agent roles are fluid and interchangeable
Planner/Deep Agent Architectureβ
Simple ReAct agents can be "shallow"βthey struggle with longer-running tasks and complex multi-turn conversations. Deep Research agents implement a planner-based architecture to plan tasks and invoke sub-agents, system tools, and human-in-the-loop interactions.
Examples: Claude Code, AWS Kiro CLI, research assistants
Use cases:
- Complex research tasks requiring multiple information sources
- Long-running workflows with checkpoints and human approval
- Tasks requiring strategic planning before execution
Supervisor Architectureβ
A supervisor agent orchestrates tasks among sub-agents, either within the same system or over a network. The supervisor routes requests, aggregates responses, and maintains overall task coordination.
Use cases:
- Systems with clear task delegation patterns
- Scenarios requiring centralized coordination
- Applications where sub-agents have distinct, non-overlapping capabilities
Hierarchical Supervisor Architectureβ
This pattern extends the supervisor model with multiple levelsβsupervisors managing other supervisors. This enables large-scale systems with complex organizational structures.
Use cases:
- Enterprise-scale systems with many specialized agents
- Organizations with complex reporting structures
- Systems requiring multiple levels of abstraction and delegation
2.3 Benefits of Multi-Agent Systemsβ
Specialization and Expertise
- Each agent can be optimized for specific tasks
- Domain-specific knowledge and tools per agent
- Better performance than generalist agents
Scalability
- Add new capabilities by adding new agents
- Scale individual agents based on demand
- No need to retrain or reconfigure existing agents
Maintainability
- Changes to one agent don't affect others
- Easier to debug and test individual components
- Clear separation of concerns
Resilience
- System continues functioning if one agent fails
- Agents can be updated independently
- Graceful degradation of capabilities
3. The Agent-to-Agent (A2A) Protocolβ
3.1 What is A2A?β
The Agent-to-Agent (A2A) Protocol is an open standard that enables AI agents to communicate over the network in a consistent, interoperable way. Instead of every system inventing custom APIs, A2A defines how agents announce their identity, capabilities, and how they exchange requests, responses, and streaming updates.
[!TIP] Think of A2A as "HTTP for AI agents"βjust as HTTP standardized web communication, A2A standardizes agent communication.
3.2 Agent Cardsβ
Each agent exposes a manifest (typically at .well-known/agent.json) that other agents can discover and use to connect. This manifest is called an agent card.
An agent card contains:
- Identity: Agent name and description
- Capabilities: What the agent can do (its "cards")
- Input/Output schemas: Expected data formats
- UI hints: Optional display information
Example structure:
{
"name": "Weather Agent",
"description": "Provides weather forecasts and current conditions",
"capabilities": [
{
"name": "get_current_weather",
"description": "Get current weather for a location",
"input_schema": { "location": "string" },
"output_schema": { "temperature": "number", "conditions": "string" }
}
]
}
Other agents don't need to know implementation details β they just see "this agent offers these capabilities" and can safely call them over A2A.
3.3 How A2A Enables MASβ
A2A makes multi-agent systems practical by providing:
- Discovery: Agents can find and learn about other agents dynamically
- Interoperability: Agents from different vendors can work together
- Loose coupling: Agents don't need to know each other's internals
- Standardization: Common protocol reduces integration complexity
This makes it easy to build systems where a planner agent delegates tasks to specialized agents (search, tools, UI, code execution, etc.) using a shared, well-defined protocol.
4. Introduction to CAIPEβ
4.1 What is CAIPE?β
CAIPE (Community AI Platform Engineering) is a Multi-Agent System that provides a secure, scalable, persona-driven reference implementation with built-in knowledge base retrieval. It streamlines platform operations, accelerates workflows, and fosters innovation for modern engineering teams.
Key features:
- Production-ready multi-agent architecture
- Built-in A2A protocol support
- Modular agent design for easy extension
- Integration with MCP servers for tool access
- Web UI and CLI for agent interaction
4.2 CAIPE Demo System Architectureβ
In this lab, you'll deploy a multi-agent system that coordinates information across multiple domains. The system includes:
- π€οΈ Weather Agent: Provides weather forecasts and current conditions
- π NetUtils Agent: Offers network diagnostics and connectivity checks
- π§ Supervisor Agent: Central coordinator that orchestrates complex operations requiring data from multiple specialized systems
How it works:
-
The weather and NetUtils agents connect to their respective MCP backends
-
The supervisor agent communicates with sub-agents using the A2A protocol
-
The supervisor exposes its own A2A interface for chat clients (CLI and UI)
This demonstrates agent-to-agent communication where the supervisor intelligently routes requests to specialized agents and combines their responses.
5. Deploy the Multi-Agent System on Kubernetesβ
Now let's deploy and run the CAIPE multi-agent system using Kubernetes, Helm, and Kind!
Task 1: Verify Helm and OCI Accessβ
The CAIPE Helm chart is published as an OCI artifact on the GitHub Container Registry. Verify you can access it:
helm show chart oci://ghcr.io/cnoe-io/charts/ai-platform-engineering --version 0.2.31
What this does:
- Confirms Helm can pull the CAIPE chart from the OCI registry
- No need to clone the repository or manage local chart files
Task 2: Create a Local Kubernetes Cluster with Kindβ
If not already running, create a local Kind cluster:
kind create cluster --name caipe
[!TIP] If your Kind cluster named
caipealready exists, you do NOT need to recreate it.
To check if the cluster is running, use:
kind get clustersIf you see
caipein the output, your cluster is ready.To delete and recreate the cluster (if you want a fresh start):
kind delete cluster --name caipe
kind create cluster --name caipe
Before you proceed with deploying CAIPE on Kubernetes, make sure your kubectl context is set to your Kind cluster (caipe).
This ensures all subsequent Kubernetes commands are applied to the correct cluster.
Check your current context:
kubectl config current-context
If the output is not kind-caipe, switch to the Kind cluster context:
kubectl config use-context kind-caipe
You should see:
Switched to context "kind-caipe".
Now you're ready to continue deploying the multi-agent system to the correct Kubernetes environment!
Check your cluster with:
kubectl cluster-info --context kind-caipe
Create a dedicated namespace for the CAIPE deployment:
kubectl create namespace caipe
Task 3: Configure Environment Variablesβ
Configure your LLM credentials as a Kubernetes secret. The Helm chart expects a secret named llm-secret in the caipe namespace:
kubectl create secret generic llm-secret -n caipe \
--from-literal=LLM_PROVIDER='openai' \
--from-literal=OPENAI_API_KEY='sk-xxxxxxx' \
--from-literal=OPENAI_ENDPOINT='https://api.openai.com/v1' \
--from-literal=OPENAI_MODEL_NAME='gpt-5.2'
[!IMPORTANT] Replace the values above with your actual LLM provider credentials from the lab environment.
Task 4: Deploy with Helm from the OCI Registryβ
For this lab, we install the CAIPE Helm chart directly from the OCI registry and pass all configuration via --set flags on the command line. This means you don't have to create or edit a values.yaml file---you can just pass what you need with --set for each option in your deploy command.
Why do it this way?
- It's quicker for labs and experiments---no files to edit or keep track of.
- You can see exactly which features are turned on/off in your command.
For the lab, we enable: the UI, supervisor, weather, and NetUtils agents.
Deploy the chart:
helm upgrade --install caipe oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--namespace caipe \
--version 0.2.31 \
--set tags.caipe-ui=true \
--set tags.agent-weather=true \
--set tags.agent-netutils=true \
--set caipe-ui.config.SSO_ENABLED=false \
--set caipe-ui.env.A2A_BASE_URL=http://localhost:8000 \
--wait
[!TIP] Corporate VPN / TLS Inspection (macOS): If you are behind a corporate VPN or proxy that performs TLS inspection (e.g., Cisco AnyConnect), the agent pods will fail to connect to external endpoints like
api.openai.comwith SSL errors such asCERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate. This affects the supervisor, weather, and NetUtils agents since they all make outbound HTTPS calls to the LLM provider.First, export the full macOS certificate trust store (both the system root CAs and any corporate/local certificates) and create a ConfigMap. You must include
SystemRootCertificates.keychainto retain the standard root CAs (DigiCert, GlobalSign, etc.) needed to verify public endpoints:security find-certificate -a -p \
/System/Library/Keychains/SystemRootCertificates.keychain \
/Library/Keychains/System.keychain > /tmp/corp-ca-certs.pem
kubectl create configmap corp-ca-certs -n caipe \
--from-file=ca-certificates.crt=/tmp/corp-ca-certs.pemThen re-run the Helm install with CA cert flags for all three agents (supervisor, weather, and NetUtils):
helm upgrade --install caipe oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--namespace caipe \
--version 0.2.31 \
--set tags.caipe-ui=true \
--set tags.agent-weather=true \
--set tags.agent-netutils=true \
--set caipe-ui.config.SSO_ENABLED=false \
--set caipe-ui.env.A2A_BASE_URL=http://localhost:8000 \
--set supervisor-agent.volumes[0].name=corp-ca-certs \
--set supervisor-agent.volumes[0].configMap.name=corp-ca-certs \
--set supervisor-agent.volumeMounts[0].name=corp-ca-certs \
--set supervisor-agent.volumeMounts[0].mountPath=/etc/ssl/certs/ca-certificates.crt \
--set supervisor-agent.volumeMounts[0].subPath=ca-certificates.crt \
--set supervisor-agent.volumeMounts[0].readOnly=true \
--set supervisor-agent.env.SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
--set agent-weather.volumes[0].name=corp-ca-certs \
--set agent-weather.volumes[0].configMap.name=corp-ca-certs \
--set agent-weather.volumeMounts[0].name=corp-ca-certs \
--set agent-weather.volumeMounts[0].mountPath=/etc/ssl/certs/ca-certificates.crt \
--set agent-weather.volumeMounts[0].subPath=ca-certificates.crt \
--set agent-weather.volumeMounts[0].readOnly=true \
--set agent-weather.env.SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
--set agent-netutils.volumes[0].name=corp-ca-certs \
--set agent-netutils.volumes[0].configMap.name=corp-ca-certs \
--set agent-netutils.volumeMounts[0].name=corp-ca-certs \
--set agent-netutils.volumeMounts[0].mountPath=/etc/ssl/certs/ca-certificates.crt \
--set agent-netutils.volumeMounts[0].subPath=ca-certificates.crt \
--set agent-netutils.volumeMounts[0].readOnly=true \
--set agent-netutils.env.SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
--waitYou only need to do this once per Kind cluster. If you already deployed without the CA certs, you can re-run the above command and Helm will upgrade the existing release in place.
You can adjust or add other overrides via additional
--setflags as needed. Check the chart values for more configurable options.
This single command pulls the chart from the OCI registry and deploys the full MAS system to your local Kind cluster.
What this does:
- Installs the CAIPE Helm chart version 0.2.31 from the GitHub Container Registry
- Enables the supervisor agent (always included), weather, and NetUtils sub-agents via tags
- Enables the CAIPE web UI
- Schedules each agent as a Kubernetes Deployment and Service
- Sets up service discovery and A2A connectivity between agents
[!IMPORTANT] The deployment may take 1-2 minutes as pods start and agents initialize connections.
To monitor rollout:
kubectl get pods -n caipe
kubectl logs deployment/caipe-supervisor-agent -n caipe
6. Verify Agent Deploymentβ
Task 6: Monitor Agent Logsβ
Let's verify each agent started successfully by checking their logs via kubectl:
Weather Agentβ
kubectl logs deployment/caipe-agent-weather -n caipe
Expected output:
===================================
WEATHER AGENT CONFIG
===================================
AGENT_URL: http://0.0.0.0:8000
===================================
Running A2A server in p2p mode.
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
What to look for:
- β Agent configuration displayed
- β A2A server running
- β Successful startup and agent card requests
NetUtils Agentβ
kubectl logs deployment/caipe-agent-netutils -n caipe
Expected output:
===================================
NETUTILS AGENT CONFIG
===================================
AGENT_URL: http://0.0.0.0:8000
===================================
Running A2A server in p2p mode.
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Supervisor Agentβ
kubectl logs deployment/caipe-supervisor-agent -n caipe
Expected output:
[INFO] [_serve:83] Started server process [1]
[INFO] [startup:48] Waiting for application startup.
[INFO] [startup:62] Application startup complete.
[INFO] [_log_started_message:215] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
What to look for:
- β Server process started
- β Application startup complete
- β Uvicorn running
- β Log messages showing successful agent registration/discovery
The supervisor performs dynamic monitoring and removes unavailable agents from the toolset until they return.
7. Explore Agent Capabilitiesβ
Task 7: Inspect Agent Cardsβ
Fetch each agent's capabilities via a port-forwarded service. First, port-forward a service:
kubectl port-forward service/caipe-agent-weather 8002:8000 -n caipe
kubectl port-forward service/caipe-agent-netutils 8014:8000 -n caipe
kubectl port-forward service/caipe-supervisor-agent 8000:8000 -n caipe
Weather Agent Cardβ
curl http://localhost:8002/.well-known/agent.json | jq
What you'll see:
- Agent name and description
- Available weather-related capabilities
- Input/output schemas for each capability
- Endpoint information
NetUtils Agent Cardβ
curl http://localhost:8014/.well-known/agent.json | jq
What you'll see:
- Agent name and description
- Network diagnostic capabilities (e.g., ping, DNS check)
- Input/output schemas for each capability
- Endpoint information
Supervisor Agent Cardβ
curl http://localhost:8000/.well-known/agent.json | jq
What you'll see:
- Combined capabilities from all sub-agents
- Routing and orchestration capabilities
- Aggregated schemas from weather and NetUtils agents
[!NOTE] The supervisor's agent card dynamically reflects the capabilities of all connected sub-agents. This is the power of A2Aβautomatic capability discovery and aggregation!
8. Interact with the Multi-Agent Systemβ
Task 8: Open Caipe UIβ
Access the web interface for a visual chat experience. Port-forward the UI service:
kubectl port-forward service/caipe-caipe-ui 3000:3000 -n caipe
Then open your browser to http://localhost:3000.
Features:
- Visual chat interface
- Real-time agent responses
- Capability discovery
- Multi-turn conversations
Task 9: Test Agent Discoveryβ
Try these prompts to explore the multi-agent system:
Discover available agents:
What agents are available?
Explore capabilities:
What can you help me with?
Expected behavior:
The supervisor will report capabilities from both the weather and NetUtils agents, demonstrating dynamic capability aggregation.
Task 10: Test Weather Agentβ
Try weather-specific queries:
Current conditions:
What's the current weather in San Francisco?
Forecast:
Give me a 5-day forecast for London
What's happening behind the scenes:
- Your query goes to the supervisor agent
- The supervisor identifies this as a weather-related request
- The supervisor routes the request to the weather agent via A2A
- The weather agent calls its MCP server to get real data
- The response flows back through the supervisor to you
You can check logs in another terminal tab:
kubectl logs -f deployment/caipe-supervisor-agent -n caipe
Task 11: Test NetUtils Agentβ
Try network diagnostic queries:
Ping a host:
Check if google.com is reachable.
DNS resolve:
Can you resolve the DNS for api.github.com?
What's happening:
- The supervisor receives your network-related query
- It routes the request to the NetUtils agent
- The NetUtils agent performs diagnostics using its tools/MCP backend
- Results are returned through the supervisor
Task 12: Test Cross-Agent Scenariosβ
Try queries that require both agents to work together:
Multi-domain query:
Get me today's weather for New York, and also test if api.github.com is reachable. Summarize both results.
Complex reasoning:
Based on current weather in Berlin, do a network check to the local weather data API endpoint and summarize both the weather and the network results.
What's happening:
- The supervisor analyzes the query and identifies which agents are needed
- It calls the weather agent to get weather data
- It calls the NetUtils agent to do the requested checks
- It uses the LLM to reason about and synthesize an answer
Observe how the UI displays agent tool calls, information flow, and the synthesized response.
9. Alternative: CLI Chat Clientβ
Task 13: Connect via CLIβ
You can also interact with the multi-agent system using a text-based CLI client.
Port-forward the supervisor if not already:
kubectl port-forward service/caipe-supervisor-agent 8000:8000 -n caipe
Then run:
uvx https://github.com/cnoe-io/agent-chat-cli.git a2a
[!NOTE] When prompted to
π¬ Enter token (optional):, just press enter β. In production, your system will use a JWT or Bearer token for authentication here.
Try a test query:
What's the current weather in San Francisco?
When finished, exit the chat CLI with Ctrl+C.
10. Clean Upβ
Task 14: Stop the Systemβ
When you're done exploring, delete the CAIPE deployment and Kind cluster:
helm uninstall caipe -n caipe
kind delete cluster --name caipe
What this does:
- Gracefully deletes all Kubernetes resources
- Tears down the Kind cluster and underlying containers
11. Summaryβ
Congratulations! You've completed Part 2 of the AI Agents lab series. Here's what you accomplished:
β Understood Multi-Agent System (MAS) concepts and architecture patterns β Learned about the Agent-to-Agent (A2A) protocol β Deployed a cloud-native, production MAS with CAIPE using Helm and Kind β Explored agent cards and capability discovery β Tested single-agent and cross-agent interactions using weather and network tools β Used both CLI and web UI to interact with agents
Key Takeaways from Part 2β
- Multi-Agent Systems enable specialization - Each agent focuses on what it does best
- A2A protocol standardizes agent communication - Like HTTP for AI agents
- Agent cards enable dynamic discovery - Agents can find and use each other's capabilities
- Supervisor patterns coordinate complex tasks - Central orchestration with specialized sub-agents
- MAS provides resilience and scalability - Systems continue functioning even if individual agents fail
Architecture Patterns Learnedβ
- Network/Swarm: Peer-to-peer agent communication
- Planner/Deep Agent: Strategic planning with sub-agent execution
- Supervisor: Centralized coordination of specialized agents
- Hierarchical: Multi-level supervision for enterprise scale
What's Next?β
Continue exploring advanced topics:
- Building custom agents for your domain
- Implementing advanced coordination patterns
- Adding authentication and security
- Scaling multi-agent systems in production
Additional Resourcesβ
For deeper exploration:
- Cisco Blog - Deep Dive into MAS: Detailed MAS architecture patterns
- LangChain - Multi-Agent Systems: Framework-specific guidance
- LangChain - Benchmarking Multi-Agent Architectures: Performance comparisons
- CAIPE GitHub Repository: Source code and documentation
- A2A Protocol Specification: Protocol details and standards
Part 2 Complete! You now understand how to build and deploy Kubernetes-native multi-agent systems that coordinate specialized agents, such as weather and network utilities, to solve complex, cross-domain problems.