Architecture: Platform Engineer Streaming Architecture
Date: 2024-10-23
Table of Contents
- Executive Summary
- Architecture Overview
- Routing Decision Logic
- Streaming Implementation
- Performance Analysis
- Key Findings
- Configuration & Testing
- Monitoring & Debugging
- Future Enhancements
Executive Summary
The Platform Engineer streaming architecture provides intelligent routing (DIRECT, PARALLEL, COMPLEX) for optimal performance while maintaining full backward compatibility with existing A2A clients.
Architecture Overview
Architecture Diagram
Routing Decision Logic
1. Query Analysis (_route_query)
The system analyzes incoming queries using a multi-stage decision tree:
def _route_query(self, query: str) -> RoutingDecision:
query_lower = query.lower()
# Stage 1: Explicit documentation queries
if query_lower.startswith('docs:'):
return RoutingDecision(type=RoutingType.DIRECT, agents=[('RAG', rag_url)])
# Stage 2: Explicit agent mentions
mentioned_agents = []
for agent_name, agent_url in available_agents.items():
if agent_name.lower() in query_lower:
mentioned_agents.append((agent_name, agent_url))
# Stage 3: Route based on agent count and complexity
if len(mentioned_agents) == 0:
return COMPLEX # Deep Agent handles semantic routing
elif len(mentioned_agents) == 1:
return DIRECT # Single agent direct streaming
else:
# Check for orchestration keywords
if needs_orchestration(query):
return COMPLEX # Deep Agent orchestration required
else:
return PARALLEL # Simple parallel execution
2. Routing Types
| Type | Trigger | Execution Path | Streaming Method | Performance |
|---|---|---|---|---|
| DIRECT | docs: prefix or single agent mention | Direct A2A connection | Token-by-token | Fastest |
| PARALLEL | Multiple agents, simple query | Parallel A2A connections | Aggregated chunks | Fast |
| COMPLEX | No agents OR orchestration needed | Deep Agent + System Prompt | Subagent streaming | Comprehensive |
Streaming Implementation
DIRECT Routing (Token-by-Token Streaming)
Path: Client → Platform Engineer → Direct A2A → Sub-agent → Client
async def _stream_from_sub_agent(self, agent_url, query, task, event_queue, trace_id):
"""Direct streaming bypasses Deep Agent for maximum performance"""
# Create direct A2A connection
client = A2AClient(httpx_client=httpx_client, agent_card=agent_card)
# Stream chunks from sub-agent
first_artifact_sent = False
async for response_wrapper in client.send_message_streaming(streaming_request):
if event_kind == 'artifact-update':
# Forward each token immediately (A2A protocol)
await event_queue.enqueue_event(TaskArtifactUpdateEvent(
append=first_artifact_sent, # First: False, subsequent: True
artifact=new_text_artifact(text=token_content),
lastChunk=False
))
first_artifact_sent = True
Characteristics:
- Latency: ~5-8 seconds for typical queries
- Chunks: 400-800+ small token fragments
- Use Cases:
docs:queries, single agent operations - Examples:
docs: duo-sso setup,show me komodor clusters
PARALLEL Routing (Aggregated Streaming)
Path: Client → Platform Engineer → Multiple A2A connections → Aggregation → Client
async def _stream_from_multiple_agents(self, agents, query, task, event_queue):
"""Parallel execution with result aggregation"""
# Execute all agents concurrently
tasks = [stream_single_agent(name, url) for name, url in agents]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Aggregate results with source annotations
combined_output = []
for result in results:
if result.get("status") == "success":
combined_output.append(f"## 📊 {agent_name.upper()} Results\n\n{content}")
else:
combined_output.append(f"## ❌ {agent_name.upper()} Error\n\n{error}")
# Send aggregated result as single artifact
await event_queue.enqueue_event(TaskArtifactUpdateEvent(
append=False,
lastChunk=True,
artifact=new_text_artifact(text="".join(combined_output))
))
Characteristics:
- Latency: ~8-12 seconds (parallel execution)
- Chunks: 3-10 aggregated sections
- Use Cases: Multi-agent queries without orchestration
- Examples:
show me github repos and komodor clusters
COMPLEX Routing (Deep Agent + System Prompt)
Path: Client → Platform Engineer → Deep Agent → System Prompt Analysis → Subagent Streaming → Client
This is where system prompts are primarily used for intelligent decision making.
System Prompt Integration
# In deep_agent.py
system_prompt = """
You are an AI Platform Engineer that helps users manage and operate cloud-native platforms.
## Available Agents
You have access to the following specialist agents as subagents:
- RAG: Documentation and knowledge base queries
- KOMODOR: Kubernetes cluster monitoring and troubleshooting
- GITHUB: Repository management and code operations
- PAGERDUTY: On-call schedules and incident management
- JIRA: Issue tracking and project management
## Instructions
1. Analyze user queries to determine which agents are needed
2. Invoke appropriate subagents using their streaming capabilities
3. Provide comprehensive responses combining multiple sources
4. For knowledge base queries, prefer RAG agent
5. For operational queries, use relevant monitoring/management agents
## Routing Guidelines
- Use RAG for: knowledge base queries, explanations, how-to guides
- Use KOMODOR for: cluster health, pod status, K8s troubleshooting
- Use PAGERDUTY for: on-call information, incident escalation
- Use GITHUB for: repository information, code management
- Combine multiple agents when comprehensive analysis is needed
"""
deep_agent = async_create_deep_agent(
tools=[], # No blocking tools - only streaming subagents
subagents=subagents, # All agents as streaming subagents
instructions=system_prompt, # System prompt guides decisions
model=base_model
)
Streaming via LangGraph Events
# In agent.py - Platform Engineer A2A Binding
async def stream(self, query, context_id, trace_id):
"""Stream via Deep Agent's astream_events for token-level streaming"""
async for event in self.graph.astream_events(inputs, config, version="v2"):
event_type = event.get("event")
if event_type == "on_chat_model_stream":
# Captures both:
# 1. Deep Agent's reasoning (system prompt processing)
# 2. Subagent responses (forwarded from streaming subagents)
chunk = event.get("data", {}).get("chunk")
if chunk and hasattr(chunk, "content"):
yield {
"is_task_complete": False,
"require_user_input": False,
"content": chunk.content, # Token-level content
}
Characteristics:
- Latency: ~15-30 seconds (includes LLM reasoning time)
- Chunks: 1000-3000+ tokens (reasoning + subagent responses)
- Use Cases: Ambiguous queries, multi-step operations, semantic routing
- Examples:
who is on call for SRE?,analyze the platform health
A2A Protocol Integration
Event Types and Flow
# Streaming Protocol Events
TaskArtifactUpdateEvent:
- append: False (first chunk - creates artifact)
- append: True (subsequent chunks - appends to artifact)
- lastChunk: False (more chunks coming)
- lastChunk: True (final chunk)
TaskStatusUpdateEvent:
- state: working (processing)
- state: completed (finished)
- final: False (continuing)
- final: True (task complete)
Client Compatibility
The system supports multiple client types:
- Streaming Clients: Receive token-by-token updates via
TaskArtifactUpdateEventwithappend=True - Non-Streaming Clients: Receive complete final artifact via
TaskArtifactUpdateEventwithlastChunk=True - Legacy Clients: Continue working unchanged with existing A2A protocol
Performance Characteristics
Comprehensive Performance Analysis Results
Test Date: October 2025 Test Coverage: 70 comprehensive scenarios (16 representative scenarios shown) Platform Engineer URL: http://10.99.255.178:8000
Executive Summary
| Mode | Avg Duration | First Chunk | Performance | Rank | Recommendation |
|---|---|---|---|---|---|
| DEEP_AGENT_PARALLEL_ORCHESTRATION_ORCHESTRATION | 4.94s | 0.02s | ⭐⭐⭐⭐⭐ | 🥇 Winner | Production Ready |
| DEEP_AGENT_SEQUENTIAL_ORCHESTRATION | 6.55s | 0.02s | ⭐⭐⭐⭐⭐ | 🥈 2nd | Legacy Compatible |
| DEEP_AGENT_INTELLIGENT_ROUTING | 6.97s | 0.02s | ⭐⭐⭐⭐⭐ | 🥉 3rd | Needs Investigation |
| DEEP_AGENT_ENHANCED_ORCHESTRATION | TBD | TBD | TBD | 🆕 NEW | Experimental |
Detailed Performance Breakdown
DEEP_AGENT_PARALLEL_ORCHESTRATION_ORCHESTRATION Mode (Winner - 4.94s avg)
| Query Category | Sample Query | First Chunk | Total Time | Routing |
|---|---|---|---|---|
| Knowledge Base | docs: duo-sso cli instructions | 0.03s | 4.91s | Deep Agent → RAG |
| Single Agent | show me komodor clusters | 0.02s | 7.44s | Deep Agent → Komodor |
| Multi-Agent | github repos and komodor clusters | 0.01s | 14.99s | Deep Agent → Parallel |
| Complex Analysis | analyze incident patterns | 0.01s | 30.14s | Deep Agent → Complex |
DEEP_AGENT_INTELLIGENT_ROUTING Mode (6.97s avg)
| Query Category | Sample Query | First Chunk | Total Time | Routing |
|---|---|---|---|---|
| Knowledge Base | docs: troubleshooting networks | 0.03s | 5.37s | DIRECT → RAG |
| Single Agent | komodor cluster status | 0.02s | 6.31s | DIRECT → Komodor |
| Multi-Agent | list github repos and clusters | 0.01s | Various | PARALLEL |
| Complex Analysis | compare github with komodor | 0.01s | Various | COMPLEX → Deep Agent |
DEEP_AGENT_SEQUENTIAL_ORCHESTRATION Mode (6.55s avg)
| Query Category | Expected Behavior | Performance | Routing |
|---|---|---|---|
| Knowledge Base | Deep Agent → RAG (sequential) | ~6-7s | Deep Agent Only |
| Single Agent | Deep Agent → Agent (sequential) | ~6-8s | Deep Agent Only |
| Multi-Agent | Deep Agent → Sequential execution | ~8-12s | Deep Agent Only |
| Complex Analysis | Deep Agent → Complex orchestration | ~15-30s | Deep Agent Only |
System Prompt Decision Time
Deep Agent system prompt processing adds ~2-5 seconds for:
- Query analysis and understanding
- Agent selection and reasoning
- Response synthesis and formatting
This overhead is justified by the comprehensive, intelligent responses for complex queries.
Key Findings and Analysis
Surprising Results
- DEEP_AGENT_PARALLEL_ORCHESTRATION outperformed DEEP_AGENT_INTELLIGENT_ROUTING by 29% (4.94s vs 6.97s)
- All modes achieved excellent streaming quality (0.02s first chunk latency)
- Orchestration hints in DEEP_AGENT_PARALLEL_ORCHESTRATION are highly effective
- DEEP_AGENT_INTELLIGENT_ROUTING underperformed expectations - requires investigation
Performance Analysis
- DEEP_AGENT_PARALLEL_ORCHESTRATION: Orchestration hints enable better parallel execution planning
- DEEP_AGENT_INTELLIGENT_ROUTING: Routing decision overhead may be impacting performance
- DEEP_AGENT_SEQUENTIAL_ORCHESTRATION: Predictable baseline with consistent sequential processing
Statistical Significance
- 70 comprehensive test scenarios per routing mode
- 16 representative scenarios used for quick comparisons
- Test distribution: 15 knowledge base, 20 single agent, 15 parallel, 12 complex, 8 mixed
- 100% excellent streaming quality across all modes and scenarios
Production Recommendations
🥇 Primary Recommendation: DEEP_AGENT_PARALLEL_ORCHESTRATION
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=false
FORCE_DEEP_AGENT_ORCHESTRATION=true
Benefits:
- Best overall performance (4.94s average)
- Consistent orchestration across all query types
- Effective parallel execution through orchestration hints
- Unified intelligence for complex decision making
🥈 Alternative: DEEP_AGENT_SEQUENTIAL_ORCHESTRATION (Legacy)
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=false
FORCE_DEEP_AGENT_ORCHESTRATION=false
Benefits:
- Reliable baseline performance (6.55s average)
- Predictable behavior across all scenarios
- Original proven behavior with no new dependencies
- Good for conservative environments
🤔 Investigate: DEEP_AGENT_INTELLIGENT_ROUTING
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=true
FORCE_DEEP_AGENT_ORCHESTRATION=false
Current Issues:
- Unexpectedly slower than Deep Agent modes
- Routing overhead may be affecting performance
- May need optimization or different test scenarios
- Could benefit from profiling the routing decision logic
Feature Flag and Configuration Control
Environment Variables
# Routing Mode Control (mutually exclusive)
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=true # Intelligent routing (DIRECT/PARALLEL/COMPLEX)
FORCE_DEEP_AGENT_ORCHESTRATION=true # All queries via Deep Agent with parallel hints
# Default: DEEP_AGENT_PARALLEL_ORCHESTRATION (FORCE_DEEP_AGENT_ORCHESTRATION=true, ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=false)
# Knowledge base routing keywords (comma-separated)
KNOWLEDGE_BASE_KEYWORDS="docs:,@docs" # Default: docs: or @docs prefix
KNOWLEDGE_BASE_KEYWORDS="help:,doc:,guide:" # Custom example
# Orchestration detection keywords (comma-separated)
ORCHESTRATION_KEYWORDS="analyze,compare,if,then,create,update,based on,depending on,which,that have" # Default
ORCHESTRATION_KEYWORDS="analyze,evaluate,combine,orchestrate,workflow" # Custom example
Routing Mode Comparison
DEEP_AGENT_INTELLIGENT_ROUTING (Default Production Mode)
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=true
FORCE_DEEP_AGENT_ORCHESTRATION=false
How it works:
- Intelligent routing - analyzes queries and chooses optimal execution path
- Three routing strategies:
DIRECT: Single sub-agent, direct streaming (fastest)PARALLEL: Multiple sub-agents, parallel streamingCOMPLEX: Deep Agent orchestration (when needed)
Examples:
"docs: setup guide"→ DIRECT to RAG (~5s, token-level streaming)"show me komodor clusters"→ DIRECT to Komodor (~8s, token-level streaming)"github repos and komodor clusters"→ PARALLEL execution (~8s, aggregated results)"who is on call?"→ COMPLEX via Deep Agent (~23s, intelligent orchestration)
Performance: Fastest for simple queries, scales intelligently Use Case: Production (performance + intelligence)
DEEP_AGENT_SEQUENTIAL_ORCHESTRATION (Legacy Mode)
ENABLE_DEEP_AGENT_INTELLIGENT_ROUTING=false
FORCE_DEEP_AGENT_ORCHESTRATION=false
How it works:
- All queries go through Deep Agent (original behavior)
- No orchestration hints or parallel execution guidance
- Deep Agent makes all decisions based purely on system prompt analysis
- Sequential execution - agents called one after another
Examples:
"docs: setup guide"→ Deep Agent → RAG (~15s, sequential)"show me komodor clusters"→ Deep Agent → Komodor (~18s, sequential)"github repos and komodor clusters"→ Deep Agent → Sequential GitHub then Komodor (~25s)"who is on call?"→ Deep Agent → Sequential PagerDuty then RAG (~25s)
Performance: Slowest - all queries have orchestration overhead + sequential execution Use Case: Legacy compatibility and baseline comparison
Examples by Routing Type
DIRECT Routing Examples
# Knowledge base queries (→ RAG agent) - using default KNOWLEDGE_BASE_KEYWORDS
"docs: duo-sso setup instructions"
"docs: kubernetes deployment guide"
"@docs troubleshooting network issues"
# Custom knowledge base keywords (if KNOWLEDGE_BASE_KEYWORDS="help:,guide:,@help")
"help: setup authentication"
"guide: container deployment"
"@help network configuration"
# Single agent operations (explicit agent mentions)
"show me komodor clusters" # → Komodor agent
"list github repositories" # → GitHub agent
"show pagerduty schedules" # → PagerDuty agent
PARALLEL Routing Examples
# Multi-agent simple queries
"show me github repos and komodor clusters"
"list jira issues and github pull requests"
"get pagerduty schedules and komodor alerts"
COMPLEX Routing Examples
# Semantic routing (system prompt determines agents)
"who is on call for SRE?" # → PagerDuty + RAG
"what is the escalation policy?" # → RAG (semantic)
"analyze the current platform health" # → Multiple agents + synthesis
"create a deployment plan for the new service" # → Multiple agents + orchestration
# Orchestration required
"if there are any failing pods, create jira tickets for them"
"analyze cluster health and update the documentation"
"check on-call status and escalate if issues found"
Error Handling and Fallbacks
Graceful Degradation
# Direct routing failure → fallback to Deep Agent
if routing.type == RoutingType.DIRECT:
try:
await self._stream_from_sub_agent(agent_url, query, task, event_queue)
return # Success
except Exception as e:
logger.warning(f"Direct streaming failed: {str(e)[:100]}")
logger.info("Falling back to Deep Agent for intelligent orchestration")
# Fall through to Deep Agent path
# System continues with COMPLEX routing using system prompt
Backward Compatibility
- All existing A2A clients continue to work unchanged
- Original Deep Agent behavior preserved when feature flag disabled
- Standard A2A protocol events maintained
- No breaking changes to existing integrations
Monitoring and Debugging
Log Patterns
Enhanced Streaming Mode
🎛️ Routing Mode: DEEP_AGENT_INTELLIGENT_ROUTING - Intelligent routing (DIRECT/PARALLEL/COMPLEX)
🎯 Routing decision: direct - Knowledge base query (matched: docs:) - direct to RAG
🚀 DIRECT MODE: Streaming from RAG at http://agent-rag:8000
🌊 PARALLEL MODE: Streaming from github, komodor
Deep Agent Parallel Mode
🎛️ Routing Mode: DEEP_AGENT_PARALLEL_ORCHESTRATION - All queries via Deep Agent with parallel orchestration
🤖 Detected agents in query for parallel orchestration: ['GITHUB', 'KOMODOR']
🎛️ DEEP_AGENT_PARALLEL_ORCHESTRATION mode: Routing to Deep Agent with parallel orchestration hints
Deep Agent Only Mode
🎛️ Routing Mode: DEEP_AGENT_SEQUENTIAL_ORCHESTRATION - All queries via Deep Agent (original behavior)
🤖 Deep Agent: Analyzing query for agent requirements
🤖 Deep Agent: Invoking RAG subagent for documentation query
Future Enhancements
Planned Improvements
- Adaptive Routing: Machine learning-based routing decisions
- Caching Layer: Cache frequently accessed documentation
- Load Balancing: Distribute load across multiple agent instances
- Advanced Orchestration: More sophisticated multi-agent workflows
- Real-time Monitoring: Dashboard for routing performance and health
System Prompt Evolution
The Deep Agent system prompt will be enhanced to:
- Better understand query intent and complexity
- Optimize agent selection based on historical performance
- Provide more sophisticated reasoning for multi-step operations
- Support custom user preferences and routing rules
Conclusion
The Platform Engineer streaming architecture provides optimal performance through intelligent routing while maintaining full backward compatibility. The three-tier routing system (DIRECT, PARALLEL, COMPLEX) ensures the best user experience for different query types, with system prompts enabling sophisticated decision-making for complex scenarios.
Key benefits:
- Performance: 3-5x faster for direct queries
- Flexibility: Handles simple operations and complex orchestration
- Compatibility: Zero breaking changes
- Scalability: Efficient resource utilization
- Intelligence: System prompt-driven decision making for complex queries
Related
- Spec: spec.md