ADR: ArgoCD Agent OOM Analysis & Resolution
Status: 🟢 In-use Category: Bug Fixes & Performance Date: November 5, 2025 Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Motivation
The ArgoCD agent was experiencing OOM (Out of Memory) kills in Docker when processing queries that list all ArgoCD applications (819 apps).
Best Practices
For All Agents Handling Large Datasets:
-
Add Pagination Guidelines to System Prompts
- Set thresholds (e.g., >50 items → paginate)
- Provide clear instructions for summary + first N items
- Inform users about filtering options
-
Monitor Memory Usage
- Native:
ps -p <PID> -o rss,vsz - Docker:
docker stats <container> - Look for spikes >500MB
- Native:
-
Test with Large Datasets
- Test queries that return max results
- Monitor memory during response generation
- Verify streaming completes successfully
-
LLM Output Limits
- GPT-4o: ~16K tokens output limit
- Claude: Similar limits apply
- Always paginate or summarize large result sets
Testing Results
✅ After Fix:
- Small queries (10-50 apps): Complete successfully
- Large queries (819 apps): Return summary + first 20 apps
- Memory stays under 500MB
- No stream disconnections
- No Docker OOM kills
Related
- Architecture: architecture.md