Skip to main content

Orphaned Tool Call Repair for Bedrock Multi-Turn Conversations

Status: Implemented Category: Bug Fix / Resilience Date: February 24, 2026 PRs: #842 (supervisor fixes), #31 (OTel fix)

Overview

Fixes that improve supervisor resilience during multi-turn conversations with sub-agent delegations when using AWS Bedrock as the LLM provider. Addresses orphaned tool calls that permanently break conversations and a response_format incompatibility with Bedrock's Converse API.

Problem Statement

1. Orphaned Tool Calls Break Multi-Turn Conversations

Symptom: After 2-3 turns involving sub-agent delegation, users see:

✅ I've recovered from an interrupted tool call. Let me continue processing your request...
❌ Recovery retry failed. Please ask your question again.

Root Cause: When a sub-agent call (e.g., AWS_Agent, GitHub_Agent) times out or the client disconnects mid-stream, LangGraph records an AIMessage with tool_calls but no corresponding ToolMessage. On the next turn, Bedrock's Converse API rejects the conversation with:

ValidationException: Expected toolResult blocks at messages.0.content
for the following Ids: tooluse_y6Ma8ihoB4Lqbmm4bumT7p

Impact: Conversation becomes permanently broken for that context. Users must start a new session.

Frequency: Common in multi-turn conversations with sub-agent delegations, especially when responses are large (ArgoCD listing 800+ apps, GitHub listing many PRs).

2. Bedrock response_format Causes Prefill ValidationException

Symptom: Sub-agents using aws-bedrock provider fail with:

ValidationException: This model does not support assistant message prefill.
The conversation must end with a user message.

Root Cause: LangGraph's create_react_agent with response_format appends a hidden AIMessage prefill. Bedrock's Converse API does not support assistant message prefill, causing every structured response attempt to fail.

Impact: Sub-agents fall back to error handling, producing ResponseFormat orphaned tool calls that cascade into the supervisor.

Solution Architecture

Fix 1: Enhanced Orphaned Tool Call Repair

Location: ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py

The existing _repair_orphaned_tool_calls was enhanced to detect tool call IDs across all Bedrock-specific message formats:

def _extract_tool_call_ids(msg: BaseMessage) -> set:
"""Extract tool call IDs from all possible locations in an AIMessage.

Bedrock stores tool_use IDs in three places:
1. msg.tool_calls[*]["id"] - standard LangChain format
2. msg.additional_kwargs["tool_use"] - Bedrock additional_kwargs
3. msg.content[*] blocks with - Bedrock content block format
"type": "tool_use" and "id" key
"""

Pre-fallback repair: Before entering fallback streaming mode, the supervisor now attempts orphan repair:

⚠️ Supervisor: Found 1 orphaned tool calls. IDs: ['tooluse_y6Ma...']
🔧 Will remove AIMessage with orphaned tool_call
✅ Supervisor: Removed 1 AIMessage(s) with orphaned tool calls

Force-repair: For persistent Bedrock errors, extracts tool_use IDs directly from the error message via regex and removes matching AIMessages from state.

Fix 2: Bedrock response_format Bypass

Location: ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py, ai_platform_engineering/multi_agents/platform_engineer/deep_agent.py

When LLM_PROVIDER=aws-bedrock, the response_format parameter is omitted from create_react_agent and the format instructions are embedded directly in the system prompt instead. This prevents the prefill ValidationException at its source.

Fix 3: Safe Summarization Boundary

Location: ai_platform_engineering/utils/a2a_common/langmem_utils.py

_find_safe_summarization_boundary was enhanced to prevent splitting tool_use / toolResult pairs during context compression. If a ToolMessage in the "keep" zone references a tool_call in the "summarize" zone, the boundary shifts to include the corresponding AIMessage.

Fix 4: OpenTelemetry Context Detach Noise Suppression

Location: cnoe-agent-utils/cnoe_agent_utils/tracing/decorators.py (separate repo) PR: cnoe-agent-utils#31

Added _quiet_span_exit() helper that temporarily raises the opentelemetry.context logger level to CRITICAL during span exit, preventing noisy ValueError: <Token var=<ContextVar...> was created in a different Context errors from polluting logs.

Reproduction and Verification

Multi-Turn Reproduction Test

The orphaned tool call issue is reproduced by sending 5+ turns to the supervisor using the same contextId, with queries that trigger sub-agent delegations:

CONTEXT_ID=$(python3 -c 'import uuid;print(uuid.uuid4())')

# Turn 1: GitHub sub-agent delegation
curl -sN -X POST http://localhost:8000 \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"jsonrpc":"2.0","id":"t1","method":"message/stream","params":{"message":{"role":"user","parts":[{"kind":"text","text":"List 2 recent open PRs for cnoe-io/ai-platform-engineering"}],"messageId":"m1","contextId":"'$CONTEXT_ID'"}}}'

# Turn 2: ArgoCD sub-agent (same context, builds history)
# Turn 3: Cross-reference (triggers summarization pressure)
# Turn 4: Context window check
# Turn 5: Another delegation to push context further

An automated integration test is available at integration/test_orphan_repair_multiturn.py:

PYTHONPATH=. uv run python integration/test_orphan_repair_multiturn.py
PYTHONPATH=. uv run python integration/test_orphan_repair_multiturn.py --turns 3

Verified Results (Feb 24, 2026)

Run 1: 5-turn test

TurnQueryEventsStatusText
1List 2 PRs (GitHub)104completed2,926 chars
2ArgoCD apps in caipe-preview430completed6,541 chars
3Summarize PRs + ArgoCD1,178completed27,068 chars
4Context window usage120completed2,223 chars
5Failing ArgoCD apps118completed2,041 chars

Orphan repair activated during Run 1 (confirmed conversation continued after repair):

⚠️ Supervisor: Found 1 orphaned tool calls. IDs: ['tooluse_y6Ma8ihoB4Lqbmm4bumT7p'], Names: ['AWS_Agent']
🔧 Will remove AIMessage with orphaned tool_call: msg_id=lc_run--019c919e...
✅ Supervisor: Removed 1 AIMessage(s) with orphaned tool calls. Earlier conversation history preserved.

Run 2: 10-turn stress test (GitHub + ArgoCD + Jira)

TurnQueryEventsTimeStatusText
1List 5 recent open PRs (GitHub)12620.4sPASS3,889 chars
2ArgoCD apps in caipe-preview18821.3sPASS5,988 chars
35 most recent Jira tickets245.3sTIMEOUT598 chars
4Cross-reference PRs, ArgoCD, Jira51719.4sPASS7,725 chars
5All open PRs across 2 repos89045.2sPASS42,259 chars
6Combined status report1,30759.4sPASS25,814 chars
7Failing/degraded ArgoCD apps (all namespaces)40676.9sPASS24,042 chars
8Jira sprint tickets1,758122.7sPASS60,116 chars
9Context window usage32320.3sPASS6,529 chars
10Top 3 action items (cross-reference)1,42854.3sPASS20,887 chars

10-turn summary: 9 completed, 0 failed, 1 timeout (Jira cold-start), 0 recovery failures, 0 fallback triggers.

No orphan repair was needed in Run 2, confirming that the upstream prevention fixes (Bedrock response_format bypass, safe summarization boundary) are effective at eliminating the root causes.

Error counts across both runs: 0 Recovery retry failed, 0 fallback.

Unit Tests

50 unit tests in tests/test_supervisor_streaming_json_and_orphaned_tools.py:

Test ClassTestsCoverage
TestExtractToolCallIds5Standard, additional_kwargs, content blocks, dedup
TestExtractToolCallIdsEdgeCases7camelCase, toolUseId variant, single dict, mixed, malformed
TestRepairOrphanedToolCalls4No orphans, orphan in tool_calls/kwargs/content
TestRepairOrphanedToolCallsEdgeCases6None state, empty messages, multiple orphans, partial
TestSafeummarizationBoundary4Standard, kwargs, content block pairs, complete pairs
TestSummarizationBoundaryEdgeCases6Min keep, equal, no tools, multiple pending, cross-ref
TestForceRepairRegex6Bedrock format, LangGraph format, multiple IDs, hyphens
TestPreflightContextCheckNullQuery4None query, empty string, normal query
TestPreflightContextCheckEdgeCases6None state/values, no messages, threshold, exception
TestJsonScopingFix2No local json import, module-level callable
PYTHONPATH=. uv run pytest tests/test_supervisor_streaming_json_and_orphaned_tools.py -v
# 50 passed in 3.68s

Files Changed

FileChange
ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.pyEnhanced orphan repair, pre-fallback repair, force-repair
ai_platform_engineering/utils/a2a_common/langmem_utils.py_extract_tool_call_ids, safe summarization boundary, query=None support
ai_platform_engineering/utils/a2a_common/base_langgraph_agent.pyBedrock response_format bypass, corporate CA bundle support for MCP HTTP transport
ai_platform_engineering/multi_agents/platform_engineer/deep_agent.pyBedrock response_format bypass for supervisor graph
tests/test_supervisor_streaming_json_and_orphaned_tools.py50 unit tests
integration/test_orphan_repair_multiturn.py10-turn multi-turn integration test (GitHub, ArgoCD, Jira)

Decision Rationale

Why repair at the supervisor level?

The orphaned tool call problem is inherent to LangGraph's checkpoint system with Bedrock. When a stream is cancelled, the checkpoint records the AIMessage with tool_calls but the ToolMessage response is never written. Repairing at the supervisor level (before the next LLM call) is the only place where we can access the checkpoint state and fix it before Bedrock rejects it.

Why embed response_format in system prompt for Bedrock?

Bedrock's Converse API fundamentally does not support assistant message prefill. LangGraph's create_react_agent uses prefill internally when response_format is set. Rather than patching LangGraph, we bypass the issue by embedding the format instructions in the system prompt -- achieving the same structured output behavior without triggering the prefill.

Why extract tool_call IDs from three locations?

Bedrock's Converse API stores tool_use information inconsistently across LangChain message formats. During normal operation, IDs appear in tool_calls. After checkpoint recovery, they may only exist in additional_kwargs or content blocks. Checking all three locations ensures no orphaned tool call is missed regardless of how the message was serialized.