Sub-Agent Tool Message Streaming Analysis
Status: π’ In-use Category: Architecture & Core Design Date: October 25, 2024
Note: This is a historical debugging/investigation document from October 2024. For comprehensive A2A protocol documentation with actual event data, see A2A Event Flow Architecture.
Overviewβ
This document tracks the investigation and implementation of enhanced transparency for sub-agent tool messages in the CAIPE streaming architecture conducted in October 2024. The goal was to make detailed sub-agent tool executions visible to end users for better debugging and transparency.
Document Purpose: Historical record of debugging process (October 2024), architectural limitations discovered, and implementation attempts.
Date: October 25, 2024
Problem Statementβ
Users were only seeing high-level supervisor notifications like:
π§ Calling argocd...β argocd completed
But not the detailed sub-agent tool messages like:
π§ Calling tool: **version_service__version**β Tool **version_service__version** completed
Architecture Discoveryβ
Through extensive debugging, we mapped the complete event flow from sub-agents to end users:
Key Technical Discoveriesβ
1. LangGraph Streaming Architecture Limitationβ
Critical Finding: LangGraph has two streaming modes with different event handling capabilities:
astream_events(primary): Handles native LangGraph events (on_tool_start,on_chat_model_stream,on_tool_end)astream(fallback): Handles custom events fromget_stream_writer()
The Issue: Custom events generated by get_stream_writer() are not processed by astream_events, even though they are successfully generated and logged.
2. Event Processing Pipelineβ
The complete event processing pipeline:
Sub-Agent β Status-Update Events β A2A Client β Stream Writer β Custom Events β [DROPPED] β User
β
Supervisor β LangGraph Events β astream_events β Tool Notifications β [SUCCESS] β User
3. Working vs Non-Working Eventsβ
β Working (Visible to User):
- Execution plans with
β¦β§markers - Supervisor tool notifications:
π§ Calling argocd... - Supervisor completion notifications:
β argocd completed
β Not Working (Captured but Not Visible):
- Sub-agent tool details:
π§ Calling tool: **version_service__version** - Sub-agent completions:
β Tool **version_service__version** completed - Detailed sub-agent responses (captured and accumulated but not streamed to user)
Implementation Changes Madeβ
1. Removed Status-Update Filteringβ
File: ai_platform_engineering/utils/a2a_common/a2a_remote_agent_connect.py
Before:
if text and not text.startswith(('π§', 'β
', 'β', 'π')):
accumulated_text.append(text)
logger.debug(f"β
Accumulated text from status-update: {len(text)} chars")
After:
if text:
accumulated_text.append(text)
# Stream status-update text immediately for real-time display
writer({"type": "a2a_event", "data": text})
logger.info(f"β
Streamed + accumulated text from status-update: {len(text)} chars")
Impact: All sub-agent tool messages are now captured and attempted to be streamed.
2. Enhanced Error Handlingβ
File: ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py
Added:
import asyncio
# In main streaming loop
except asyncio.CancelledError:
logging.info("Primary stream cancelled by client disconnection")
return
# In fallback streaming loop
except asyncio.CancelledError:
logging.info("Fallback stream cancelled by client disconnection")
return
Impact: Graceful handling of client disconnections without server-side errors.
3. Custom Event Handler (Attempted)β
File: ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py
Added:
# Handle custom events from sub-agents (like detailed tool messages)
elif event_type == "on_custom":
custom_data = event.get("data", {})
if isinstance(custom_data, dict) and custom_data.get("type") == "a2a_event":
custom_text = custom_data.get("data", "")
if custom_text:
logging.info(f"Processing custom a2a_event: {len(custom_text)} chars")
yield {
"is_task_complete": False,
"require_user_input": False,
"content": custom_text,
"custom_event": {
"type": "sub_agent_detail",
"source": "a2a_tool"
}
}
Impact: This handler was added but never triggered due to LangGraph's architecture limitations.
4. Logging Enhancementβ
Changed: Debug-level logs to INFO-level for better visibility during debugging.
Impact: Confirmed that status-update events are being processed correctly:
β
Streamed + accumulated text from status-update: 45 chars
β
Streamed + accumulated text from status-update: 46 chars
β
Streamed + accumulated text from status-update: 400+ chars
Current Statusβ
β Successfully Implementedβ
- Transparent status-update processing - All sub-agent messages are captured and processed
- Real-time streaming infrastructure - Events are immediately passed to stream writer
- Robust error handling - Client disconnections handled gracefully
- Enhanced logging - Full visibility into event processing pipeline
- Comprehensive architecture mapping - Complete understanding of event flow
β Architectural Limitationβ
- Custom events not displayed: Due to LangGraph's
astream_eventsmode not processing custom events fromget_stream_writer() - Sub-agent tool details not visible: Users still don't see detailed tool execution steps
π Current User Experienceβ
What Users See:
β¦π― Execution Plan: Retrieve ArgoCD Version Informationβ§
π§ Calling argocd...
β
argocd completed
[Final response with version details]
What Users Don't See (but is captured):
π§ Calling tool: **version_service__version**
β
Tool **version_service__version** completed
Possible Solutionsβ
Option 1: Force Fallback Modeβ
Modify the supervisor to use astream instead of astream_events to enable custom event processing.
Pros: Would display detailed sub-agent tool messages Cons: Might lose token-level streaming capabilities
Option 2: Enhanced Supervisor Notificationsβ
Add more detailed information to supervisor-level tool notifications using available metadata.
Pros: Works within current architecture Cons: Limited detail compared to actual sub-agent messages
Option 3: Hybrid Approachβ
Use both streaming modes or implement custom event bridging.
Pros: Best of both worlds Cons: Increased complexity
Files Modifiedβ
ai_platform_engineering/utils/a2a_common/a2a_remote_agent_connect.pyai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py
Testing Validationβ
Test Commandβ
curl -X POST http://10.99.255.178:8000 \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"id":"test","method":"message/stream","params":{"message":{"role":"user","parts":[{"kind":"text","text":"show argocd version"}],"messageId":"msg-test"}}}'
Log Validationβ
docker logs platform-engineer-p2p --since=2m | grep -E "(Streamed.*accumulated|Processing.*custom)"
Expected Output:
β
Streamed + accumulated text from status-update: 45 chars
β
Streamed + accumulated text from status-update: 46 chars
β
Streamed + accumulated text from status-update: 400+ chars
Next Stepsβ
- Decision on solution approach - Choose between forcing fallback mode, enhancing supervisor notifications, or hybrid approach
- Implementation - Based on chosen solution
- Testing - Validate that detailed tool messages reach end users
- Documentation updates - Update this diagram as changes are implemented
Current Status & Updated Documentationβ
β οΈ Historical Document: This document captures the investigation as of October 25, 2024.
For the current, comprehensive A2A protocol documentation with actual event data, real-world examples, and complete event flow analysis, see:
π A2A Event Flow Architecture (2025-10-27)β
What's included in the new documentation:
- β Complete architecture flowchart (Client β Supervisor β Sub-Agent β MCP β Tools)
- β Detailed sequence diagram showing all 6 phases of execution
- β Actual A2A event structures from real tests
- β Token-by-token streaming analysis with append flags
- β Comprehensive event type reference (task, artifact-update, status-update)
- β Event count metrics (600+ events for simple query)
- β Frontend integration examples
- β Testing commands for both supervisor and sub-agents
Use cases:
- Understanding A2A protocol: β New doc
- Debugging streaming issues: β This doc (historical context)
- Implementing frontend clients: β New doc
- Understanding architectural limitations: β This doc
Investigation Date: October 25, 2024
Document Status: Historical - See 2025-10-27-a2a-event-flow-architecture.md for current documentation
Findings: Infrastructure Complete - Architecture Limitation Identified
Outcome: LangGraph streaming limitation documented; sub-agent tool details not visible to end users via astream_events