AgentOps: Operations and Deployment Guide

Definition of AgentOps

AgentOps involves the lifecycle management and operationalization of AI agentic Systems.

Agent Registry: A repository for managing agent versions, configurations, and artifact provenance to ensure traceability and reproducibility.
Prompt Library: A versioned artifact library for managing and evaluating prompts and their effectiveness.
MCP Registry: A repository for managing MCP server versions, configurations, and artifact provenance to maintain accountability and integrity.

Overview

This document describes the comprehensive AgentOps processes we follow for building, testing, deploying, and operating the AI Platform Engineering system. It covers GitHub Actions CI/CD, sanity checks, evaluations, Helm charts, and Kubernetes deployments.

Architecture Philosophy

CAIPE AgentOps follows a microservice deployment architecture for distributed agents. This architecture enables:

Independent Scaling: Each agent can be scaled independently based on workload
Isolated Deployments: Agents are deployed as separate microservices with their own containers
Fault Isolation: Failures in one agent don't cascade to others
Technology Flexibility: Different agents can use different runtime configurations
Resource Optimization: Allocate resources per agent based on actual needs
Independent Lifecycle: Update, rollback, or replace agents without affecting the entire system
Container Registry Hosting: Agents and MCP server containers are hosted in container registries like GitHub Container Registry (GHCR) for version control, distribution, and deployment

AgentOps Methodology

The AI Platform Engineering project follows a comprehensive AgentOps methodology that ensures:

Automated Testing: Quick and detailed sanity checks validate agent functionality
Continuous Integration: GitHub Actions workflows automate builds, tests, and deployments
Quality Assurance: Evaluation frameworks validate agent routing and tool usage
Infrastructure as Code: Helm charts enable reproducible Kubernetes deployments
Observability: Distributed tracing and logging provide visibility into agent operations
Microservice Architecture: Distributed agents deployed as independent services

GitHub Actions CI/CD
Sanity Checks
Evaluations
Helm Charts
Kubernetes Deployments
Monitoring and Observability
Best Practices

GitHub Actions CI/CD

Architecture and Implementation

CAIPE uses GitHub Actions for comprehensive CI/CD automation, implementing a sophisticated workflow system that:

Change Detection: Intelligently detects which agents/components changed to build only what's necessary
Matrix Builds: Parallel builds across multiple agents and platforms for efficiency
Security Hardening: Uses step-security/harden-runner to secure workflow execution
Multi-Platform Support: Builds for both linux/amd64 and linux/arm64 architectures
Caching: Leverages GitHub Actions cache for Docker layers to speed up builds
Conditional Execution: Smart conditional logic to skip unnecessary builds

Workflow Overview

Our CI/CD pipeline consists of multiple GitHub Actions workflows that run automatically on code changes:

1. Quick Sanity Integration Tests

Workflow: tests-quick-sanity-integration-dev.yml

Purpose: Fast integration tests that validate core agent functionality

Triggers:

Push to main branch
Manual trigger via workflow_dispatch

Execution:

Runs on dedicated caipe-integration-tests runner
Uses docker-compose.dev.yaml with p2p profile
Executes make quick-sanity which runs test_prompts_quick_sanity.yaml

Test Configuration: See integration/test_prompts_quick_sanity.yaml

# integration/test_prompts_quick_sanity.yaml
prompts:
  - id: "quick_test_1"
    messages:
      - role: "user"
        content: "Test prompt"
    expected_keywords: ["keyword1", "keyword2"]
    category: "quick_sanity"

Key Steps:

Cleanup: Remove Python cache files from previous runs
Workspace Setup: Ensure directory exists with correct permissions
Checkout: Clone repository code
Secret Management: Create .env from GitHub Secrets (masked in logs)
Docker Setup: Verify Docker and Docker Compose versions
Python Setup: Install Python 3.13 for A2A client tests
Service Startup: Start services with docker compose -f docker-compose.dev.yaml --profile=p2p up -d
Log Streaming: Stream logs in background to file and console
Readiness Check: Wait up to 3 minutes (36 retries × 5s) for service health
Test Execution: Run make quick-sanity which executes test_prompts_quick_sanity.yaml
Artifact Upload: Upload logs on failure for debugging
Cleanup: Stop containers, remove volumes, clean Docker images

Implementation Details:

Dedicated Runner:

Uses caipe-integration-tests self-hosted runner
Provides persistent Docker daemon
Faster startup times compared to GitHub-hosted runners

Readiness Check:

# Checks both health endpoints
curl -sfS http://localhost:8000/ >/dev/null || \
curl -sfS http://localhost:8000/.well-known/agent.json >/dev/null

Log Management:

Logs streamed to compose-live.log file
Background process allows tests to run while logging
Last 300 lines shown on failure
Full logs uploaded as artifact

Environment Setup:

Creates .env file with secrets from GitHub Actions secrets:
- LLM provider credentials (Azure OpenAI)
- Agent API tokens (ArgoCD, Backstage, Atlassian, GitHub, PagerDuty, Splunk, Komodor, Slack)
- A2A transport configuration (A2A_TRANSPORT=p2p)
- Tracing disabled (ENABLE_TRACING=false)

2. Detailed Sanity Integration Tests

Workflow: tests-detailed-sanity-integration.yml

Purpose: Comprehensive integration tests with verbose output

Triggers:

Manual trigger via workflow_dispatch only

Execution:

Runs on ubuntu-latest runner
Uses production docker-compose.yaml with stable image tag
Executes make detailed-test which runs test_prompts_detailed.yaml

Differences from Quick Sanity:

More comprehensive test coverage
Uses production Docker Compose configuration
Verbose logging enabled (log_cli_level=INFO)
Longer test execution time

3. Tag-Based Sanity Tests

Workflows:

tests-quick-sanity-integration-on-latest-tag.yml - Tests against latest tag
tests-quick-sanity-integration-on-stable-tag.yml - Tests against stable tag

Purpose: Validate that published container images work correctly

Triggers:

Push to main branch
Scheduled (nightly at 2 AM UTC for latest tag)
Manual trigger

Execution:

Pulls specific image tags from GitHub Container Registry
Runs same quick sanity tests against tagged images
Ensures published images maintain quality

4. Helm Chart Testing

Workflow: helm-chart-test.yml

Purpose: Validate Helm chart templates and configurations

Triggers:

Push to main or develop branches (when helm/** changes)
Pull requests (when helm/** changes)
Scheduled (daily at 2 AM UTC)

Test Matrix:

Tests against multiple Helm versions: v3.18.2, v3.17.3

Test Scenarios:

Chart Validation: Lint and package charts
Dependency Management: Verify external dependencies (Milvus, Neo4j, External Secrets)
Installation Tests: Dry-run installations with various configurations
Resource Configuration: Test with custom CPU/memory limits
Ingress Configuration: Test ingress setups
Security Contexts: Validate security settings
Storage Classes: Test persistent volume configurations
Multi-Agent Configurations: Test different agent combinations
SLIM Integration: Test SLIM transport configuration
External Secrets: Validate external secrets integration

Example Test:

helm template test-all-services . \
  --set ai-platform-engineering.enabled=true \
  --set backstage-plugin-agent-forge.enabled=true \
  --set kb-rag-stack.enabled=true \
  --set graphrag.enabled=true

5. Helm Chart Publishing

Workflow: helm.yml

Purpose: Automatically publish Helm charts to GitHub Container Registry

Triggers:

Push to main branch (when charts/** changes)
Pull requests (when charts/** changes)

Process:

1. Version Bump Check (PR only):

Validates that chart versions are bumped when substantive changes are made:

# Detects which charts changed
CHART_CHANGES=$(git diff --name-only origin/$BASE_BRANCH...HEAD | grep "^charts/")

# Checks if changes are more than just Chart.lock
if echo "$CHART_CHANGES" | grep -qv "Chart.lock$"; then
  # Version bump required
  if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
    echo "❌ Error: Chart has changes but version was not bumped!"
    exit 1
  fi
fi

Logic:

✅ Skips version check for Chart.lock-only changes (dependency updates)
❌ Fails PR if substantive changes without version bump
✅ Allows new charts without version check

2. Chart Packaging:

# Update dependencies
helm dependency update charts/rag-stack/
helm dependency update charts/ai-platform-engineering/

# Package charts
helm package charts/rag-stack/ --destination ./packaged-charts/
helm package charts/ai-platform-engineering/ --destination ./packaged-charts/

Dependency Verification:

# Verify nested dependencies are included
tar -tzf charts/ai-platform-engineering/charts/rag-stack-*.tgz | \
  grep -q "^rag-stack/charts/neo4j/Chart.yaml"  # Must include neo4j
tar -tzf charts/ai-platform-engineering/charts/rag-stack-*.tgz | \
  grep -q "^rag-stack/charts/milvus/Chart.yaml"  # Must include milvus

3. Chart Publishing:

REGISTRY="oci://ghcr.io/${{ github.repository_owner }}/charts"

# Check if version already exists
if helm pull $REGISTRY/$CHART_NAME --version $CHART_VERSION; then
  echo "⚠️  Version already exists, skipping"
else
  helm push "$CHART_FILE" $REGISTRY
fi

Publishing Logic:

Only publishes on push to main branch
Skips if version already exists (idempotent)
Supports multiple charts: rag-stack, ai-platform-engineering
Uses OCI registry format (oci://)

Registry: oci://ghcr.io/cnoe-io/helm-charts

Installation:

helm repo add cnoe-io oci://ghcr.io/cnoe-io/helm-charts
helm repo update
helm install ai-platform-engineering cnoe-io/ai-platform-engineering

6. Agent Build Workflows

Workflows:

ci-supervisor-agent.yml - Builds supervisor agent
ci-mcp-sub-agent.yml - Builds MCP sub-agents
ci-a2a-sub-agent.yml - Builds A2A sub-agents
ci-a2a-rag.yml - Builds RAG agents
ci-agent-forge-plugin.yml - Builds Backstage plugin

Purpose: Build and publish container images for individual agents and MCP servers

Container Registry: All agent and MCP server containers are hosted in GitHub Container Registry (GHCR) at ghcr.io/cnoe-io/

Implementation Details

1. Change Detection and Path Filtering

Workflows use intelligent change detection to build only affected components. See ci-a2a-sub-agent.yml for implementation:

# Example from ci-a2a-sub-agent.yml
- name: Detect changed paths
  id: filter
  uses: dorny/paths-filter@v3
  with:
    filters: |
      shared:
        - 'ai_platform_engineering/utils/a2a/**'
        - 'build/agents/Dockerfile.a2a'
      github:
        - 'ai_platform_engineering/agents/github/**'
        - '!ai_platform_engineering/agents/github/mcp/**'

Logic:

Push to main or tags: Builds all agents
Pull Requests: Only builds agents with changed files
Shared changes: If shared utilities change, all agents are rebuilt
Manual dispatch: Can force build all agents with build_all: true input

2. Matrix Build Strategy

Agents are built in parallel using GitHub Actions matrix:

strategy:
  matrix:
    agent: ${{ fromJson(needs.determine-agents.outputs.agents) }}
  fail-fast: false  # Continue building other agents if one fails

Supported Agents:

A2A Agents: argocd, aws, backstage, confluence, github, jira, komodor, pagerduty, slack, splunk, template, webex, weather
MCP Agents: argocd, backstage, confluence, jira, komodor, pagerduty, slack, splunk, webex
RAG Components: agent-rag, agent-ontology, server, webui

Example: When ai_platform_engineering/agents/github/ changes, only agent-github and mcp-github images are built. When ai_platform_engineering/utils/a2a/ changes, all agents are rebuilt.

3. Security Hardening

All build workflows use step-security/harden-runner:

- name: 🔒 harden runner
  uses: step-security/harden-runner@95d9a5deda9de15063e7595e9719c11c38c90ae2
  with:
    egress-policy: audit  # Monitor outbound connections

Benefits:

Prevents supply chain attacks
Audits network egress
Restricts unnecessary network access

4. Multi-Platform Builds

All images are built for both architectures:

- name: Set up QEMU
  uses: docker/setup-qemu-action@v3

- name: Build and Push Docker image
  uses: docker/build-push-action@v6
  with:
    platforms: linux/amd64,linux/arm64

Process:

Docker Buildx: Sets up advanced build features
QEMU: Enables cross-platform builds (ARM64 on AMD64 runners)
Parallel Builds: Both architectures built simultaneously
Manifest Lists: Creates multi-arch manifests automatically

5. Docker Layer Caching

Uses GitHub Actions cache for faster builds:

cache-from: type=gha  # GitHub Actions cache
cache-to: type=gha,mode=max  # Store all layers

Cache Strategy:

Cache Key: Based on Dockerfile content and build context
Cache Scope: Per workflow run, shared across matrix jobs
Cache Mode: max stores all layers for maximum reuse
Benefits: Reduces build time by 50-80% on subsequent runs

6. Image Tagging Strategy

Multiple tags are generated automatically:

tags: |
  type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
  type=ref,event=branch,prefix=
  type=ref,event=tag,prefix=
  type=sha,format=short,prefix=

Tag Examples:

latest - Latest build from main branch
main - Branch name tag
v1.2.3 - Semantic version tag
abc1234 - Short SHA tag
stable - Production release tag (manual)

7. Conditional Push Logic

Images are only pushed on specific conditions:

push: ${{ github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/') }}

Push Conditions:

✅ Push to main branch
✅ Push of tags (releases)
❌ Pull requests (build only, no push)
❌ Feature branches (unless manually triggered)

8. Dockerfile Resolution

Workflows support agent-specific Dockerfiles:

- name: Determine Dockerfile path
  id: dockerfile
  run: |
    if [ -f "${{ env.AGENT_DIR }}/build/Dockerfile.a2a" ]; then
      echo "path=${{ env.AGENT_DIR }}/build/Dockerfile.a2a"
    else
      echo "path=build/agents/Dockerfile.a2a"  # Fallback to shared
    fi

Priority:

Agent-specific Dockerfile: ai_platform_engineering/agents/{agent}/build/Dockerfile.{type} (e.g., ai_platform_engineering/agents/github/build/Dockerfile.a2a)
Shared Dockerfile: build/agents/Dockerfile.{type} (e.g., build/agents/Dockerfile.a2a)

9. Build Arguments

Agent-specific build arguments:

build-args: |
  AGENT_NAME=${{ matrix.agent }}
  AGENT_PACKAGE=${{ steps.agent_package.outputs.name }}

Special Cases:

template agent uses petstore package name
MCP agents use AGENT_NAME for MCP server configuration
RAG components use component-specific build args

10. Pre-Release Workflows

Separate workflows for PR preview builds:

Workflows:

Trigger: Pull requests with branch prefix prebuild/

Image Naming: ghcr.io/cnoe-io/prebuild/{component}:{pr-number}

Purpose: Preview images for testing before merge

Image Naming Conventions:

Agents: ghcr.io/cnoe-io/agent-{name}:{tag} (e.g., ghcr.io/cnoe-io/agent-github:stable)
MCP Servers: ghcr.io/cnoe-io/mcp-{name}:{tag} (e.g., ghcr.io/cnoe-io/mcp-argocd:latest)
Supervisor: ghcr.io/cnoe-io/ai-platform-engineering:{tag}
Backstage Plugin: ghcr.io/cnoe-io/backstage-plugin-agent-forge:{tag}
RAG Components: ghcr.io/cnoe-io/caipe-rag-{component}:{tag} (e.g., ghcr.io/cnoe-io/caipe-rag-server:latest)

Example Build Output:

# After pushing to main branch
ghcr.io/cnoe-io/agent-github:latest
ghcr.io/cnoe-io/agent-github:main
ghcr.io/cnoe-io/agent-github:abc1234  # Short SHA

# After tagging v1.2.3
ghcr.io/cnoe-io/agent-github:v1.2.3
ghcr.io/cnoe-io/agent-github:1.2
ghcr.io/cnoe-io/agent-github:1

Registry Benefits:

Version Control: Each container image is tagged with specific versions for reproducibility
Distribution: Centralized registry enables easy deployment across environments
Security: SBOM and attestations provide supply chain security
Access Control: GitHub-based authentication and permissions
Multi-Platform: Supports both AMD64 and ARM64 architectures
Parallel Builds: Matrix strategy builds multiple agents simultaneously
Smart Caching: Docker layer caching reduces build times significantly

Workflow Best Practices

Secret Management: All secrets stored in GitHub Actions secrets, never hardcoded
Artifact Retention: Logs and test results retained for debugging
Cleanup: Always clean up Docker resources and workspace
Failure Handling: Upload logs and artifacts on failure for debugging
Parallel Execution: Independent workflows run in parallel for faster feedback

Sanity Checks

Quick Sanity

Command: make quick-sanity

Purpose: Fast validation of core agent functionality

Execution:

cd integration && A2A_PROMPTS_FILE=test_prompts_quick_sanity.yaml uv run pytest -o log_cli=true -o log_cli_level=DEBUG

Test File: integration/test_prompts_quick_sanity.yaml

Characteristics:

Minimal test set (5-10 prompts)
Fast execution (< 5 minutes)
Validates basic routing and responses
Used in CI/CD for quick feedback

Example:

# Run quick sanity locally
make quick-sanity

# Output shows:
# ✓ GitHub agent routes correctly
# ✓ ArgoCD agent responds to queries
# ✓ Multi-agent orchestration works
# All tests passed in 3m 42s

Detailed Sanity

Command: make detailed-sanity or make detailed-test

Purpose: Comprehensive validation of all agent capabilities

Execution:

cd integration && A2A_PROMPTS_FILE=test_prompts_detailed.yaml uv run pytest -o log_cli=true -o log_cli_level=INFO

Test File: integration/test_prompts_detailed.yaml

Characteristics:

Comprehensive test coverage (20+ prompts)
Tests all agent types (GitHub, ArgoCD, Jira, etc.)
Validates complex multi-agent workflows
Longer execution time (15-30 minutes)
Used for pre-release validation

Test Categories:

Single agent routing
Multi-agent orchestration
Tool usage validation
Error handling
Streaming responses

Example:

# Run detailed sanity
make detailed-sanity

# Tests 20+ scenarios including:
# - "Show GitHub repos and ArgoCD apps" → Parallel routing
# - "Who is on-call?" → Deep agent with PagerDuty + RAG
# - Error handling when agent unavailable

ArgoCD-Specific Sanity

Command: make argocd-sanity

Purpose: Validate ArgoCD agent functionality

Execution:

cd integration && A2A_PROMPTS_FILE=test_prompts_argocd_sanity.yaml uv run pytest -o log_cli=true -o log_cli_level=INFO

Test File: integration/test_prompts_argocd_sanity.yaml

Focus: ArgoCD-specific operations:

Application listing
Application status checks
Resource queries
Cluster information

Running Sanity Checks Locally

Prerequisites:

Services running: docker compose -f docker-compose.dev.yaml --profile=p2p up -d
Python 3.13+ with uv package manager
.env file configured with API credentials

Quick Sanity:

make quick-sanity

Detailed Sanity:

make detailed-sanity

ArgoCD Sanity:

make argocd-sanity

Adding New Sanity Tests

Method 1: YAML Configuration (Recommended)

Add to appropriate test file:

prompts:
  - id: "my_new_test"
    messages:
      - role: "user"
        content: "My test prompt"
    expected_keywords: ["keyword1", "keyword2"]
    category: "my_category"

Method 2: Python Test Function

Add to integration/integration_ai_platform_engineering.py:

async def test_my_new_functionality(self):
    """Test my new functionality"""
    response = await send_message_to_agent("my prompt")
    assert response is not None
    assert len(response) > 0
    # Add specific assertions

Evaluations

Overview

The evaluation system provides automated testing of Platform Engineer multi-agent workflows using Langfuse dataset evaluation capabilities.

Architecture

┌─────────────────┐
│  Langfuse UI    │
│   Dashboard     │
└────────┬────────┘
         │
         │ Trigger Evaluation
         ▼
┌─────────────────┐
│  Webhook Service│
│ langfuse_webhook│
└────────┬────────┘
         │
         │ Orchestrate
         ▼
┌─────────────────┐
│ Evaluation      │
│ Runner          │
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
┌────────┐ ┌──────────────┐
│ A2A    │ │ Trace        │
│ Client │ │ Extractor    │
└───┬────┘ └──────┬───────┘
    │             │
    │             │
    ▼             ▼
┌─────────────────────────┐
│  Platform Engineer      │
│  Multi-Agent System     │
└─────────────┬───────────┘
              │
              │ Auto-trace
              ▼
┌─────────────────────────┐
│  Langfuse Server        │
│  Trace Storage          │
└─────────────┬───────────┘
              │
              │ Analyze
              ▼
    ┌─────────────────┐
    │ Dual Evaluators │
    ├─────────────────┤
    │ Routing         │
    │ Evaluator       │
    ├─────────────────┤
    │ Tool Match      │
    │ Evaluator       │
    └────────┬────────┘
             │
             │ Submit Scores
             ▼
    ┌─────────────────┐
    │ Langfuse UI     │
    │ Results Display │
    └─────────────────┘

Dual Evaluator System

The system uses two specialized evaluators:

Routing Evaluator: Validates supervisor-to-agent routing decisions
- Checks if correct agent was selected
- Validates routing logic
- Scores: 0.0 - 1.0
Tool Match Evaluator: Validates agent-to-tool usage patterns
- Checks if correct tools were used
- Validates tool parameters
- Scores: 0.0 - 1.0

Quick Start

1. Start the System:

docker compose -f docker-compose.dev.yaml --profile p2p-tracing up

2. Upload Dataset:

cd evals
python upload_dataset.py datasets/single_agent.yaml

3. Configure Webhook in Langfuse:

Navigate to Langfuse UI: http://localhost:3000
Go to Datasets → Select your dataset
Click "Start Experiment" → "Custom Experiment" (⚡ button)
Set webhook URL: http://evaluation-webhook:8000/evaluate
Click "Run" to start evaluation

4. Monitor Results:

View evaluation progress in Langfuse dashboard
Check individual trace scores and reasoning
Analyze routing and tool usage patterns

Dataset Format

name: single_agent_tests
description: Single agent evaluation tests
prompts:
  - id: "github_repo_description"
    messages:
      - role: "user"
        content: "show repo description for ai-platform-engineering"
    expected_agents: ["github"]
    expected_behavior: "Should use GitHub agent to fetch repository description"
    expected_output: "The ai-platform-engineering repository is a platform engineering toolkit..."

Example: Upload and run evaluation:

# Upload dataset
cd evals
python upload_dataset.py datasets/single_agent.yaml

# Trigger evaluation via Langfuse UI
# Navigate to http://localhost:3000 → Datasets → Start Experiment
# Set webhook: http://evaluation-webhook:8000/evaluate
# Click "Run"

# Check results
curl http://localhost:8011/health
# Response shows evaluation status and scores

Evaluation Flow

Dataset Upload: YAML datasets uploaded to Langfuse
Webhook Trigger: Langfuse UI triggers evaluation via webhook
Request Processing: Runner sends prompts to Platform Engineer via A2A
Trace Analysis: Extract tool calls and agent interactions from traces
Dual Evaluation:
- Route correctness (supervisor → agent)
- Tool alignment (agent → tool)
Score Submission: Results submitted back to Langfuse with detailed reasoning

Environment Variables

# Langfuse configuration
export LANGFUSE_PUBLIC_KEY="pk-lf-your-key"
export LANGFUSE_SECRET_KEY="sk-lf-your-key"
export LANGFUSE_HOST="http://localhost:3000"

# Platform Engineer connection
export PLATFORM_ENGINEER_URL="http://localhost:8000"

# Optional: LLM evaluation (fallback uses pattern matching)
export OPENAI_API_KEY="your-openai-key"

Monitoring

Service Health:

curl http://localhost:8011/health

Response:

{
  "status": "healthy",
  "langfuse": "configured",
  "evaluators": ["routing", "tool_match"],
  "platform_engineer": "connected"
}

Helm Charts

Chart Structure

charts/
├── ai-platform-engineering/
│   ├── Chart.yaml
│   ├── values.yaml
│   ├── templates/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── ingress.yaml
│   │   ├── secret.yaml
│   │   └── ...
│   └── charts/
│       ├── agent/
│       ├── supervisor-agent/
│       └── backstage-plugin-agent-forge/
└── rag-stack/
    ├── Chart.yaml
    ├── values.yaml
    └── charts/
        ├── agent-rag/
        ├── agent-ontology/
        ├── rag-server/
        └── ...

Main Charts

1. AI Platform Engineering Chart

Location: charts/ai-platform-engineering/

Components:

Supervisor agent (Platform Engineer)
Individual agent deployments (GitHub, ArgoCD, Jira, etc.)
MCP server deployments
Backstage plugin agent-forge

Key Features:

Microservice Architecture: Each agent deployed as independent Kubernetes deployment
Multi-agent orchestration via supervisor agent
Configurable prompt types (default, deep_agent, custom)
External secrets support
Ingress configuration per agent
Horizontal Pod Autoscaling (HPA) per agent
Resource limits and requests per agent
Independent scaling and lifecycle management

2. RAG Stack Chart

Location: charts/rag-stack/

Components:

RAG server
RAG web UI
RAG agents (rag, ontology)
Redis (persistent)
Neo4j (via dependency)
Milvus (via dependency)

Dependencies:

Neo4j Helm chart
Milvus Helm chart

Prompt Configuration

The chart supports multiple prompt configurations with versioning:

Versioning Strategy

Prompt configuration files are versioned alongside the Helm chart:

Chart Version: Tracked in Chart.yaml (version: 0.4.7)
App Version: Tracked in Chart.yaml (appVersion: 0.2.1)

ConfigMap Labels: Includes version metadata for traceability:

labels:
  app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}

Prompt Config Files:

data/prompt_config.yaml - Default configuration
data/prompt_config.deep_agent.yaml - Deep agent configuration

When the Helm chart version is bumped, prompt config changes are included in that chart version, ensuring:

Reproducibility: Specific chart versions always use the same prompt config
Traceability: ConfigMap labels include version information
Rollback Safety: Downgrading chart version restores previous prompt config

Default Configuration

promptConfigName: "default"  # Uses data/prompt_config.yaml

Balanced orchestrator
General platform engineering tasks
Medium strictness
Versioned with chart release

Deep Agent Configuration

promptConfigName: "deep_agent"  # Uses data/prompt_config.deep_agent.yaml

Strict zero-hallucination mode
Mission-critical operations
High strictness
Versioned with chart release

Custom Configuration

promptConfig: |
  agent_name: "My Custom Platform Agent"
  agent_description: |
    Custom description...
  system_prompt_template: |
    Custom system prompt...

Overrides versioned configs
Not versioned by chart (managed separately)
Use for specialized workflows

Versioned Config Selection

The Helm template selects config files based on promptConfigName:

# Template logic in templates/prompt-config.yaml
{{- $configName := .Values.promptConfigName | default "deep_agent" }}
{{ .Files.Get (printf "data/prompt_config.%s.yaml" $configName) | nindent 4 }}

Best Practice: Always specify promptConfigName explicitly in your values file to ensure consistent behavior across upgrades:

# values-secrets.yaml
promptConfigName: "deep_agent"  # Explicitly set for production

Example: Deploying with default prompt config:

helm install ai-platform-engineering . \
  --values values-secrets.yaml \
  --set promptConfigName=default

Example: Deploying with deep agent prompt config:

helm install ai-platform-engineering . \
  --values values-secrets.yaml \
  --set promptConfigName=deep_agent

Deployment Options

Option 1: Simple Deployment (Port-Forward)

helm install ai-platform-engineering . --values values-secrets.yaml

Access:

kubectl port-forward service/ai-platform-engineering-agent-github 8001:8000

Option 2: Ingress Deployment

helm install ai-platform-engineering . \
  --values values-secrets.yaml \
  --values values-ingress.yaml

Configure DNS:

echo "$(minikube ip) agent-github.local" | sudo tee -a /etc/hosts

Secret Management

Option 1: Direct Secrets

cp values-secrets.yaml.example values-secrets.yaml
# Edit with your values
helm install ai-platform-engineering . --values values-secrets.yaml

Option 2: Existing Kubernetes Secrets

agent-argocd:
  secrets:
    secretName: "my-existing-secret"

Option 3: External Secrets (Recommended)

cp values-external-secrets.yaml.example values-external-secrets.yaml
# Configure external secrets store
helm install ai-platform-engineering . --values values-external-secrets.yaml

Chart Testing

Local Testing:

helm template test . --values values-secrets.yaml

Dry-Run Installation:

helm install --dry-run --debug ai-platform-engineering . --values values-secrets.yaml

Chart Linting:

helm lint charts/ai-platform-engineering/

See charts/ai-platform-engineering/ and charts/rag-stack/ for chart source code.

Chart Publishing

Charts are automatically published to GitHub Container Registry on merge to main:

Registry: oci://ghcr.io/cnoe-io/helm-charts

Installation:

helm repo add cnoe-io oci://ghcr.io/cnoe-io/helm-charts
helm repo update
helm install ai-platform-engineering cnoe-io/ai-platform-engineering

Kubernetes Deployments

Microservice Architecture

CAIPE follows a microservice deployment architecture where each agent is deployed as an independent Kubernetes service:

Independent Deployments: Each agent (GitHub, ArgoCD, Jira, etc.) runs in its own pod/deployment
Service Isolation: Agents communicate via A2A protocol, not direct dependencies
Individual Scaling: Scale agents independently based on workload (e.g., scale GitHub agent separately from ArgoCD)
Resource Allocation: Set CPU/memory limits per agent based on actual usage patterns
Health Checks: Each agent has its own liveness and readiness probes
Rolling Updates: Update individual agents without affecting others
Fault Tolerance: Agent failures are isolated and don't cascade

Example Architecture:

┌─────────────────────────────────────┐
│  Supervisor Agent (Orchestrator)   │
│  - Routes requests to agents        │
│  - Manages multi-agent workflows   │
└──────────────┬──────────────────────┘
               │
       ┌───────┴────────┐
       │  A2A Protocol  │
       └───────┬────────┘
               │
    ┌──────────┼──────────┐
    │          │          │
    ▼          ▼          ▼
┌────────┐ ┌────────┐ ┌────────┐
│ GitHub │ │ ArgoCD │ │  Jira │
│ Agent  │ │ Agent  │ │ Agent │
│ Pod    │ │ Pod    │ │ Pod   │
└────────┘ └────────┘ └────────┘
    │          │          │
    └──────────┼──────────┘
               │
        ┌──────┴──────┐
        │  MCP Servers │
        │  (Separate)  │
        └─────────────┘

Prerequisites

Kubernetes cluster (1.24+)
Helm 3.14+
kubectl configured
Container Registry Access: Access to GitHub Container Registry (ghcr.io/cnoe-io/) where all agent and MCP server containers are hosted
- Authenticate: echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
- Pull images: docker pull ghcr.io/cnoe-io/agent-github:stable

Deployment Steps

1. Configure Secrets

Create values-secrets.yaml:

global:
  imageRegistry: ghcr.io/cnoe-io

ai-platform-engineering:
  enabled: true
  image:
    repository: ai-platform-engineering
    tag: stable

agent-github:
  enabled: true
  secrets:
    secretName: agent-secrets

2. Create Kubernetes Secrets

kubectl create secret generic agent-secrets \
  --from-literal=GITHUB_PERSONAL_ACCESS_TOKEN=your-token \
  --from-literal=AZURE_OPENAI_API_KEY=your-key

3. Install Chart

helm install ai-platform-engineering . \
  --namespace ai-platform \
  --create-namespace \
  --values values-secrets.yaml

4. Verify Deployment

kubectl get pods -n ai-platform
# Expected output:
# NAME                                    READY   STATUS    RESTARTS   AGE
# ai-platform-engineering-supervisor-0    1/1     Running   0          2m
# agent-github-7d8f9c4b5-abc12           1/1     Running   0          2m
# agent-argocd-6c7e8d9a0-def34           1/1     Running   0          2m

kubectl get services -n ai-platform
kubectl get ingress -n ai-platform

Example: Check specific agent logs:

kubectl logs -n ai-platform deployment/agent-github --tail=50

Resource Management

Resource Limits:

agent-github:
  resources:
    limits:
      cpu: 1000m
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 1Gi

Horizontal Pod Autoscaling:

agent-github:
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
    targetCPUUtilizationPercentage: 80

High Availability

Multi-Replica Deployment:

agent-github:
  replicaCount: 3
  podDisruptionBudget:
    enabled: true
    minAvailable: 2

Node Affinity:

agent-github:
  nodeSelector:
    kubernetes.io/os: linux
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - agent-github
            topologyKey: kubernetes.io/hostname

Persistent Storage

Redis Persistence:

rag-redis:
  persistence:
    enabled: true
    storageClass: fast-ssd
    size: 10Gi

Network Policies

Ingress Configuration:

agent-github:
  ingress:
    enabled: true
    className: nginx
    hosts:
      - host: agent-github.example.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - secretName: agent-github-tls
        hosts:
          - agent-github.example.com

Monitoring Integration

ServiceMonitor for Prometheus:

agent-github:
  serviceMonitor:
    enabled: true
    interval: 30s
    scrapeTimeout: 10s

Troubleshooting

View Pod Logs:

kubectl logs -n ai-platform deployment/agent-github -f

Describe Pod:

kubectl describe pod -n ai-platform agent-github-xxx

Check Events:

kubectl get events -n ai-platform --sort-by='.lastTimestamp'

Port Forward for Debugging:

kubectl port-forward -n ai-platform service/agent-github 8001:8000
# Test agent at http://localhost:8001

Example: Debugging a failing agent:

# Check pod status
kubectl get pods -n ai-platform | grep agent-github
# Output: agent-github-xxx   0/1   CrashLoopBackOff   3   2m

# Check logs for errors
kubectl logs -n ai-platform agent-github-xxx --previous

# Check events for resource issues
kubectl describe pod -n ai-platform agent-github-xxx | grep -A 5 Events

# Common issues:
# - ImagePullBackOff: Check registry access
# - CrashLoopBackOff: Check application logs
# - OOMKilled: Increase memory limits

Monitoring and Observability

Distributed Tracing

Langfuse Integration:

Automatic trace collection
Tool call tracking
Agent interaction visualization
Performance metrics

Configuration:

agent-github:
  environment:
    - name: ENABLE_TRACING
      value: "true"
    - name: LANGFUSE_PUBLIC_KEY
      valueFrom:
        secretKeyRef:
          name: langfuse-secrets
          key: public-key
    - name: LANGFUSE_SECRET_KEY
      valueFrom:
        secretKeyRef:
          name: langfuse-secrets
          key: secret-key
    - name: LANGFUSE_HOST
      value: "http://langfuse-web:3000"

Logging

Log Levels:

DEBUG: Detailed debugging information
INFO: General operational information
WARNING: Warning messages
ERROR: Error conditions

Log Aggregation:

Structured JSON logging
Centralized log collection (via Fluentd/Fluent Bit)
Log retention policies

Metrics

Key Metrics:

Request rate
Response time (p50, p95, p99)
Error rate
Agent routing decisions
Tool usage patterns

Prometheus Integration:

agent-github:
  serviceMonitor:
    enabled: true
    path: /metrics
    port: http

Health Checks

Liveness Probe:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10

Readiness Probe:

readinessProbe:
  httpGet:
    path: /.well-known/agent.json
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5

Best Practices

Development

Run Quick Sanity Before Committing: make quick-sanity
Run Detailed Sanity Before Release: make detailed-sanity
Test Helm Charts Locally: helm template test . --values values-secrets.yaml
Validate Kubernetes Manifests: kubectl apply --dry-run=client -f manifests/

CI/CD

Version Bumps: Always bump chart versions for substantive changes
Secret Management: Never commit secrets, use GitHub Actions secrets
Artifact Retention: Keep logs and test results for debugging
Parallel Execution: Use workflow dependencies for parallel runs

Deployment

Staging First: Deploy to staging before production
Gradual Rollout: Use canary deployments for major changes
Rollback Plan: Always have a rollback strategy
Monitoring: Set up alerts before deployment

Operations

Resource Limits: Always set resource limits and requests
Health Checks: Configure liveness and readiness probes
Logging: Enable structured logging and centralized collection
Tracing: Enable distributed tracing for production workloads

Summary

This AgentOps guide provides a comprehensive overview of:

CI/CD: Automated testing and deployment via GitHub Actions
Sanity Checks: Quick and detailed validation of agent functionality
Evaluations: Automated assessment of agent routing and tool usage
Helm Charts: Infrastructure as code for Kubernetes deployments
Kubernetes: Production-ready deployment configurations

Following these practices ensures reliable, scalable, and maintainable AI Platform Engineering deployments.

Definition of AgentOps​

Overview​

Architecture Philosophy​

AgentOps Methodology​

Table of Contents​

GitHub Actions CI/CD​

Architecture and Implementation​

Workflow Overview​

1. Quick Sanity Integration Tests​

2. Detailed Sanity Integration Tests​

3. Tag-Based Sanity Tests​

4. Helm Chart Testing​

5. Helm Chart Publishing​

6. Agent Build Workflows​

Implementation Details​

Workflow Best Practices​

Sanity Checks​

Quick Sanity​

Detailed Sanity​

ArgoCD-Specific Sanity​

Running Sanity Checks Locally​

Adding New Sanity Tests​

Evaluations​

Overview​

Architecture​

Dual Evaluator System​

Quick Start​

Dataset Format​

Evaluation Flow​

Environment Variables​

Monitoring​

Helm Charts​

Chart Structure​

Main Charts​

1. AI Platform Engineering Chart​

2. RAG Stack Chart​

Prompt Configuration​

Versioning Strategy​

Default Configuration​

Deep Agent Configuration​

Custom Configuration​

Versioned Config Selection​

Deployment Options​

Option 1: Simple Deployment (Port-Forward)​

Option 2: Ingress Deployment​

Secret Management​

Option 1: Direct Secrets​

Option 2: Existing Kubernetes Secrets​

Option 3: External Secrets (Recommended)​

Chart Testing​

Chart Publishing​

Kubernetes Deployments​

Microservice Architecture​

Prerequisites​

Deployment Steps​

1. Configure Secrets​

2. Create Kubernetes Secrets​

3. Install Chart​

4. Verify Deployment​

Resource Management​

High Availability​

Persistent Storage​

Network Policies​

Monitoring Integration​

Troubleshooting​

Monitoring and Observability​

Distributed Tracing​

Logging​

Metrics​

Health Checks​

Best Practices​

Development​

CI/CD​

Deployment​

Operations​

Summary​

Definition of AgentOps

Overview

Architecture Philosophy

AgentOps Methodology

Table of Contents

GitHub Actions CI/CD

Architecture and Implementation

Workflow Overview

1. Quick Sanity Integration Tests

2. Detailed Sanity Integration Tests

3. Tag-Based Sanity Tests

4. Helm Chart Testing

5. Helm Chart Publishing

6. Agent Build Workflows

Implementation Details

Workflow Best Practices

Sanity Checks

Quick Sanity

Detailed Sanity

ArgoCD-Specific Sanity

Running Sanity Checks Locally

Adding New Sanity Tests

Evaluations

Overview

Architecture

Dual Evaluator System

Quick Start

Dataset Format

Evaluation Flow

Environment Variables

Monitoring

Helm Charts

Chart Structure

Main Charts

1. AI Platform Engineering Chart

2. RAG Stack Chart

Prompt Configuration

Versioning Strategy

Default Configuration

Deep Agent Configuration

Custom Configuration

Versioned Config Selection

Deployment Options

Option 1: Simple Deployment (Port-Forward)

Option 2: Ingress Deployment

Secret Management

Option 1: Direct Secrets

Option 2: Existing Kubernetes Secrets

Option 3: External Secrets (Recommended)

Chart Testing

Chart Publishing

Kubernetes Deployments

Microservice Architecture

Prerequisites

Deployment Steps

1. Configure Secrets

2. Create Kubernetes Secrets

3. Install Chart

4. Verify Deployment

Resource Management

High Availability

Persistent Storage

Network Policies

Monitoring Integration

Troubleshooting

Monitoring and Observability

Distributed Tracing

Logging

Metrics

Health Checks

Best Practices

Development

CI/CD

Deployment

Operations

Summary