OpenAI Response Deduplication — Investigation & Fix

Problem

When using OpenAI models (gpt-4o, gpt-5-mini, gpt-5.2) through CAIPE, the chat UI shows human-readable text followed by a raw JSON blob and then the same text repeated:

## Current weather — Allen, Texas, US
| Temperature | 77.1 °F |
...
{"status":"completed","message":"## Current weather — Allen, Texas, US\n..."}
## Current weather — Allen, Texas, US
...

Anthropic Claude and AWS Bedrock do not exhibit this behavior.

Root Cause

OpenAI's PlatformEngineerResponse structured output is streamed as plain message.content text, not as a tool call. The upstream agent.py in its post-stream parsing (PRIORITY 2 and 3 paths) calls handle_structured_response() but does not set from_response_format_tool = True on the resulting final_response dict.

This causes agent_executor.py to take the wrong code path (from_response_format_tool=False), falling back to _get_final_content() which joins all of supervisor_content — a mix of clean text and raw PlatformEngineerResponse JSON — producing duplicated output.

Why Bedrock/Claude are unaffected

Bedrock and Claude emit structured responses as tool calls, which are handled in the PRIORITY 1 path where from_response_format_tool is already correctly set to True. They never enter the PRIORITY 2/3 paths.

Key code locations

File	Path	What
`agent.py`	`multi_agents/platform_engineer/protocol_bindings/a2a/agent.py`	Streaming orchestration; PRIORITY 1/2/3 post-stream parsing
`agent_executor.py`	`multi_agents/platform_engineer/protocol_bindings/a2a/agent_executor.py`	Final content assembly; uses `from_response_format_tool` flag
`response_format.py`	`multi_agents/platform_engineer/response_format.py`	`PlatformEngineerResponse` Pydantic model
`deep_agent.py`	`multi_agents/platform_engineer/deep_agent.py`	`USE_STRUCTURED_RESPONSE` env var

Fix Applied

Location: agent.py, stream() method — PRIORITY 2 and PRIORITY 3 paths.

Change: After handle_structured_response() successfully parses a PlatformEngineerResponse (indicated by is_task_complete being non-None), set from_response_format_tool = True:

# PRIORITY 2 — using final AIMessage content
final_response = self.handle_structured_response(final_content)
if final_response.get('is_task_complete') is not None:
    final_response['from_response_format_tool'] = True

# PRIORITY 3 — using accumulated AI content (same fix)
final_response = self.handle_structured_response(accumulated_text)
if final_response.get('is_task_complete') is not None:
    final_response['from_response_format_tool'] = True

This directs agent_executor.py to use the clean parsed .message content from the structured response instead of the raw supervisor_content accumulation.

Deployment method

The fix is deployed via a Kubernetes ConfigMap (agent-fix) that replaces agent.py inside the supervisor pod:

ConfigMap: agent-fix  (from scripts/agent_fix.py)
Mount:     /app/ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py
Scope:     caipe-supervisor-agent deployment only

This is managed by setup-caipe.sh → post_deploy_patches() → _create_agent_fix_configmap() + _apply_agent_fix_volume().

Backward compatibility

Bedrock: Unaffected — uses PRIORITY 1 (tool-call) path; the fix code is never reached.
Claude: Unaffected — same reason as Bedrock.
OpenAI: Fixed — PRIORITY 2/3 paths now correctly set the flag.

Test Results

Models tested

Model	Prompt	Result
`gpt-4o`	"Weather in Allen, Texas"	Clean output, no JSON blobs
`gpt-4o`	"run network diagnostics on google.com"	Clean output, no duplication
`gpt-5-mini`	"Weather in Allen, Texas"	Clean output, no JSON blobs
`gpt-5-mini`	"run network diagnostics on google.com"	Clean output, no duplication
`gpt-5.2`	"Weather in Allen, Texas"	Clean output, no JSON blobs
`gpt-5.2`	"run network diagnostics on google.com"	Clean output, no duplication

Verification method

curl -s -X POST http://localhost:8000/ \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":"t","method":"message/send",
       "params":{"message":{"role":"user",
         "parts":[{"kind":"text","text":"Weather in Allen TX"}],
         "messageId":"t1"}}}' | python3 -c "
import sys, json
r = json.load(sys.stdin)
final = [a for a in r['result']['artifacts'] if a['name']=='final_result'][0]
text = ''.join(p['text'] for p in final['parts'] if p['kind']=='text')
has_json = '{\"status\"' in text
print('HAS RAW JSON:', has_json)
print(text[:500])
"

Expected: HAS RAW JSON: False

Approaches Considered and Rejected

1. Stripping JSON in `_get_final_content` (Option A)

Regex-based stripping of PlatformEngineerResponse JSON blobs from the joined supervisor_content. Rejected as fragile "monkey patching" — it treats symptoms rather than the root cause.

2. Filtering in `_handle_streaming_chunk` (Option B)

Skipping JSON blobs before appending to supervisor_content. Rejected for the same reason — operates on symptoms and is hard to maintain.

3. `USE_STRUCTURED_RESPONSE=false` (Option C)

Disables structured output entirely. Rejected as it changes response quality and removes a feature rather than fixing the integration.

4. Patching `_handle_task_complete` via `sitecustomize.py` (Fix 3)

The original sitecustomize.py approach patched _handle_task_complete to extract .message when from_response_format_tool=True. This failed because from_response_format_tool was never True for OpenAI — the exact root cause this investigation identified.

Current Patch State

Patch	ConfigMap	Scope	Status
Schema fix (`additionalProperties:false`)	`agent-patches`	All agents	Working
httpx redirect (`follow_redirects=True`)	`agent-patches`	All agents	Working
OpenAI response dedup (`from_response_format_tool=True`)	`agent-fix`	Supervisor	Working

File References

File	Description
`scripts/agent_fix.py`	Patched `agent.py` with the two-line fix
`scripts/agent_executor_fix.py`	Original `agent_executor.py` (unmodified, for reference)
`setup-caipe.sh` (repo root)	Deployment script managing ConfigMaps and patches
Upstream repo	ai-platform-engineering

Problem​

Root Cause​

Why Bedrock/Claude are unaffected​

Key code locations​

Fix Applied​

Deployment method​

Backward compatibility​

Test Results​

Models tested​

Verification method​

Approaches Considered and Rejected​

1. Stripping JSON in _get_final_content (Option A)​

2. Filtering in _handle_streaming_chunk (Option B)​

3. USE_STRUCTURED_RESPONSE=false (Option C)​

4. Patching _handle_task_complete via sitecustomize.py (Fix 3)​

Current Patch State​

File References​