ADR: ArgoCD Agent - OOM Protection Strategy

Status: 🟢 In-use Category: Architecture & Design Date: November 5, 2025 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

Overview

This document outlines the multi-layered OOM (Out of Memory) protection strategy implemented for the ArgoCD agent to handle large queries safely.

The ArgoCD agent was experiencing OOM crashes when:

Listing all 819+ applications in a single response
Processing large JSON payloads (255KB+) from ArgoCD API
LLM output exceeded 16K token limit, causing stream disconnection and memory accumulation

Pagination Tests (4/4 PASSED):

Memory Usage: ~424 MiB / 4 GiB (10.35%) OOMKilled: false Container Status: Stable, running for extended periods

Large Query Test: Request "list all applications" multiple times in rapid succession
Concurrent Query Test: Send 5+ queries simultaneously
Memory Leak Test: Run 100+ queries and monitor memory growth
Edge Case Test: Search for common terms that match 500+ items

The ArgoCD agent now has 5 layers of OOM protection:

Current Status:

Recommended Next Steps:

MCP Tools: ai_platform_engineering/agents/argocd/mcp/mcp_argocd/tools/
Agent Prompt: ai_platform_engineering/agents/argocd/agent_argocd/protocol_bindings/a2a_server/agent.py
Context Management: ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py
Docker Config: docker-compose.dev.yaml
Search Tool: ai_platform_engineering/agents/argocd/mcp/mcp_argocd/tools/search.py