Skip to main content

Run CAIPE on Amazon EKS

This guide walks you through creating an Amazon EKS (Elastic Kubernetes Service) cluster and deploying CAIPE (Community AI Platform Engineering) on it. No prior experience with CAIPE or EKS is required.

What is EKS? EKS is AWS’s managed Kubernetes service. You get a production-ready cluster without managing control-plane nodes yourself. eksctl is a simple CLI to create and manage EKS clusters with sensible defaults.

What you’ll do: Create an EKS cluster, install ArgoCD (optional, for GitOps-style deploys), then deploy CAIPE using the Helm chart. You’ll need an AWS account and the tools listed below.


Step 1: Clone the repository​

You need the repo to use the EKS cluster configuration example and to follow the same paths as this guide.

git clone https://github.com/cnoe-io/ai-platform-engineering.git
cd ai-platform-engineering

The EKS config example lives under deploy/eks/. We’ll use it in a later step.


Step 2: Prerequisites​

Install and configure these before creating the cluster:

ToolPurpose
AWS CLIAuthenticate to AWS and run commands (install)
eksctlCreate and manage EKS clusters (install)
kubectlTalk to your Kubernetes cluster (install)
HelmInstall CAIPE and add-ons (install)

AWS account: Your user or role needs permissions for EC2, EKS, CloudFormation, and IAM (for cluster and node creation). See Required AWS permissions below.


Step 3: Configure AWS credentials​

Log in to AWS and confirm your identity:

# Configure AWS CLI (you’ll be prompted for Access Key ID and Secret)
aws configure

# Confirm credentials work
aws sts get-caller-identity

# Optional: set a default region
export AWS_DEFAULT_REGION=us-east-2

Use the same region in the next step when you create the cluster.


Step 4: Create the EKS cluster​

The repo includes an example cluster config. Copy it and adjust the region or other settings if needed.

# From the repo root
cp deploy/eks/dev-eks-cluster-config.yaml.example dev-eks-cluster-config.yaml

# Edit if you need to change region, node type, or node count
# (optional) cat dev-eks-cluster-config.yaml

Create the cluster. This usually takes 10–15 minutes:

eksctl create cluster -f dev-eks-cluster-config.yaml

eksctl will:

  • Create a VPC and subnets
  • Set up the EKS control plane
  • Launch EC2 worker nodes
  • Configure your kubectl context to use the new cluster
  • Install common add-ons

Verify the cluster​

# List EKS clusters
eksctl get cluster

# Check that nodes are ready
kubectl get nodes

# Cluster and API server info
kubectl cluster-info

# Optional: list add-ons and system pods
eksctl get addons --cluster dev-eks-cluster
kubectl get pods -n kube-system

Once kubectl get nodes shows nodes in Ready state, you can deploy CAIPE.


Step 5: Deploy CAIPE on EKS​

You have two main options:

Option A: Install CAIPE with Helm​

Install the CAIPE Helm chart directly on the cluster. Configure secrets and LLM settings as described in the Helm guide.

helm install ai-platform-engineering oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--version 0.2.8 \
--namespace ai-platform-engineering \
--create-namespace \
--set-string tags.basic=true

Then:

Full details: Deploy CAIPE with Helm.

Option B: Use ArgoCD, then deploy CAIPE​

ArgoCD keeps your cluster in sync with Git (or Helm) and is useful for ongoing updates. You can install ArgoCD first, then deploy the CAIPE chart through ArgoCD or Helm.

Install ArgoCD on the cluster:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Access the ArgoCD UI (optional):

kubectl port-forward svc/argocd-server -n argocd 8080:443

Open http://localhost:8080. Then deploy CAIPE via the Helm chart (as in Option A) or by defining an ArgoCD Application that points at the same chart (see Helm setup – ArgoCD).


For production-style ingress (e.g. LoadBalancer services), install the AWS Load Balancer Controller:

# Create IAM service account for the controller
eksctl create iamserviceaccount \
--cluster=dev-eks-cluster \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--role-name AmazonEKSLoadBalancerControllerRole \
--attach-policy-arn=arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess \
--approve

# Add the EKS chart repo and install the controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=dev-eks-cluster \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller

# Verify
kubectl get deployment -n kube-system aws-load-balancer-controller

Use your actual cluster name if it’s not dev-eks-cluster (match the name in dev-eks-cluster-config.yaml).


Required AWS permissions​

Your AWS user or role needs permissions for:

  • EC2 — Instances, VPC, subnets, security groups
  • EKS — Cluster and node group management
  • CloudFormation — Stacks created by eksctl
  • IAM — Roles and policies for the cluster and node groups

If something fails with “access denied”, check IAM permissions for eksctl and your organisation’s policies.


Troubleshooting​

Insufficient permissions​

aws iam get-user
aws iam list-attached-user-policies --user-name YOUR_USERNAME

Fix by attaching the required policies or using a role that has them.

Region mismatch​

Ensure the region in dev-eks-cluster-config.yaml matches your AWS CLI default:

aws configure get region

Node group creation fails​

  • Inspect CloudFormation:
    aws cloudformation describe-stack-events --stack-name eksctl-dev-cluster-nodegroup-worker-nodes
  • Check EC2 limits:
    aws ec2 describe-account-attributes --attribute-names supported-platforms

kubectl can’t reach the cluster​

# Refresh kubeconfig for your cluster (use your region and cluster name)
aws eks update-kubeconfig --region us-east-2 --name dev-eks-cluster

# Confirm current context
kubectl config current-context

Cleanup​

When you’re done, delete the cluster to avoid ongoing AWS charges:

eksctl delete cluster -f dev-eks-cluster-config.yaml

Verify that CloudFormation stacks are gone:

aws cloudformation list-stacks --query 'StackSummaries[?contains(StackName, `eksctl-dev-cluster`)].{Name:StackName,Status:StackStatus}'

Important: Always tear down the cluster when you’re not using it to prevent unexpected charges.


Next steps​