Skip to main content

ADR: ArgoCD Agent OOM Analysis & Resolution

Status: 🟢 In-use Category: Bug Fixes & Performance Date: November 5, 2025 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

Motivation

The ArgoCD agent was experiencing OOM (Out of Memory) kills in Docker when processing queries that list all ArgoCD applications (819 apps).

Best Practices

For All Agents Handling Large Datasets:

  1. Add Pagination Guidelines to System Prompts

    • Set thresholds (e.g., >50 items → paginate)
    • Provide clear instructions for summary + first N items
    • Inform users about filtering options
  2. Monitor Memory Usage

    • Native: ps -p <PID> -o rss,vsz
    • Docker: docker stats <container>
    • Look for spikes >500MB
  3. Test with Large Datasets

    • Test queries that return max results
    • Monitor memory during response generation
    • Verify streaming completes successfully
  4. LLM Output Limits

    • GPT-4o: ~16K tokens output limit
    • Claude: Similar limits apply
    • Always paginate or summarize large result sets

Testing Results

After Fix:

  • Small queries (10-50 apps): Complete successfully
  • Large queries (819 apps): Return summary + first 20 apps
  • Memory stays under 500MB
  • No stream disconnections
  • No Docker OOM kills