Tasks: Unify Single-Node & Distributed Binding + RAG Reliability + Test Harness
Input: Design documents from docs/docs/specs/098-unify-single-distributed-binding/
Prerequisites: plan.md, spec.md, research.md, data-model.md, quickstart.md
Tests: Included β the test harness (Phase D) is a core deliverable of this spec.
Organization: Tasks are grouped by user story. US-1 through US-6 are already completed. Remaining work covers US-7 completion and the full test harness (Phase D).
Format: [ID] [P?] [Story] Descriptionβ
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (e.g., US1, US7)
- Include exact file paths in descriptions
Path Conventionsβ
- Supervisor backend:
ai_platform_engineering/multi_agents/platform_engineer/ - A2A binding:
ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/ - Slack bot:
ai_platform_engineering/integrations/slack_bot/ - Unit tests:
tests/ - Integration tests:
integration/ - Fixtures:
tests/fixtures/
Phase 1: Completed Work (US-1 through US-6) β β
These phases are already implemented and tracked in the spec's Implementation Progress table. Listed here for completeness and dependency tracking.
- T001 [US1] Merge
deep_agent_single.pyintodeep_agent.pyinai_platform_engineering/multi_agents/platform_engineer/deep_agent.py - T002 [US1] Merge
agent_single.pyintoagent.pyinai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py - T003 [US1] Merge
agent_executor_single.pyintoagent_executor.py - T004 [US1] Merge
main_single.pyintomain.py - T005 [US1] Merge Docker Compose into single
caipe-supervisorservice indocker-compose.dev.yaml - T006 [US1] Auto-enable connectivity check when
DISTRIBUTED_AGENTSis set inai_platform_engineering/multi_agents/agent_registry.py - T007 [US5] Implement
_get_distributed_agents()and_agent_is_distributed()inai_platform_engineering/multi_agents/platform_engineer/deep_agent.py - T008 [US5] Unit tests for per-agent distribution in
tests/test_distributed_agents.py - T009 [US2] Tool narration with correct subagent name extraction in
ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py - T010 [US2] Unit tests for tool narration in
tests/test_streaming_narration.py - T011 [US6] Implement
FetchDocumentCapWrapperandSearchCapWrapperinai_platform_engineering/multi_agents/platform_engineer/rag_tools.py - T012 [US6] Wire cap wrappers into tool loading in
ai_platform_engineering/multi_agents/platform_engineer/deep_agent.py - T013 [US6] Configurable
LANGGRAPH_RECURSION_LIMITinai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py - T014 [US6] Configurable recursion limit in
ai_platform_engineering/utils/a2a_common/base_langgraph_agent.py - T015 [US6]
GraphRecursionErrorisinstance detection inai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py - T016 [US6] Update RAG prompt instructions in
charts/ai-platform-engineering/data/prompt_config.deep_agent.yaml - T017 [US6] RAG env vars in
docker-compose.dev.yaml - T018 [US6] Unit tests for RAG cap wrappers in
tests/test_rag_tools_hard_stop.py - T019 [US7] Remove
continuesuppressing intermediate narrative inai_platform_engineering/integrations/slack_bot/utils/ai.py - T020 [US7] Add RAG tool exclusion from echo suppression in
ai_platform_engineering/integrations/slack_bot/utils/ai.py
Checkpoint: US-1 through US-6 fully implemented. US-7 partially complete (narrative streaming works, final answer delivery pending).
Phase 2: US-7 Completion β Final Answer Delivery (Priority: P1)β
Goal: Fix the final synthesized answer being cleared before the FINAL_RESULT event, so Slack receives the actual answer.
Independent Test: Ask a RAG query via Slack and verify the final answer text is non-empty and contains the synthesized content (not "I've completed your request." with no body).
Implementation for US-7 Completionβ
- T021 [US7] Fix
final_response['content'] = ''to includeresponse_format_resultcontent inai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py(~line 1862) - T022 [US7] Verify edge case: no
response_format_resultβ ensure content falls back to accumulated model output; addedfinal_model_contentfallback fromresponse_format_result(~line 1922)
Checkpoint: SC-012 satisfied β final synthesized answer arrives in Slack as non-empty FINAL_RESULT artifact.
Phase 3: Test Harness β JSON Fixtures (Phase D.5)β
Goal: Create deterministic test fixtures for use by all subsequent test phases.
Independent Test: Fixtures load successfully and conform to expected schemas.
- T023 [P] Create
tests/fixtures/directory structure - T024 [P] Create canned RAG search response fixture in
tests/fixtures/rag_search_response.json(3 results with document IDs, scores, snippets) - T025 [P] Create canned fetch_document response fixture in
tests/fixtures/rag_fetch_document_response.json(single document, ~5K chars) - T026 [P] Create canned A2A agent card fixture in
tests/fixtures/a2a_agent_card.json(minimal valid agent card with skills, capabilities) - T027 [P] Create canned A2A task SSE stream fixture in
tests/fixtures/a2a_task_sse_stream.json(complete task lifecycle: statusβnotificationβstreamingβfinal_resultβcompleted)
Checkpoint: All 4 fixture files exist and are valid JSON.
Phase 4: Test Harness β FINAL_RESULT Content Tests (Phase D.3)β
Goal: Verify the A2A binding correctly includes response_format_result content in the FINAL_RESULT artifact. Validates the Phase 2 fix.
Independent Test: pytest tests/test_final_result_content.py -v β all tests pass.
- T028 [US7] Create test file with 26 tests in
ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/tests/test_final_result_content.py - T029 [US7] Test:
response_format_resultpresent βfinal_response['content']is non-empty (+ dedup survival, Bedrock list format, require_user_input) - T030 [US7] Test: no
response_format_resultβ content from accumulated model output (+ empty fallback) - T031 [US7] Test: yielded
FINAL_RESULTevent carries full synthesized answer viafinal_model_content(+ trace_id propagation, regression guards)
Checkpoint: FINAL_RESULT content logic verified by 3+ test cases.
Phase 5: Test Harness β A2A Binding Streaming Events (Phase D.1)β
Goal: Verify the A2A binding yields correct streaming events (tool notifications, execution plans, narrative, final result) with proper artifact names and content.
Independent Test: pytest tests/test_binding_streaming_events.py -v β all tests pass.
- T032 [P] [US2] Create test file with executor mock helpers in
protocol_bindings/a2a/tests/test_binding_streaming_events.py - T033 [US2] Test:
TOOL_NOTIFICATION_STARThas correctsource_agent(not "task") β 3 tests for github, write_todos, tool_result - T034 [US2] Test:
EXECUTION_PLAN_UPDATEpresent with agent-tagged steps β executor plan artifact emission - T035 [US7] Test:
STREAMING_RESULTevents contain narrative text β 3 tests for content, append, post-subagent - T036 [US7] Test:
FINAL_RESULThas non-empty content β 3 tests for final_model_content, accumulated, stream_end - T037 [US7] Test: event ordering β notifications before streaming before final_result
Checkpoint: All A2A binding streaming event assertions pass.
Phase 6: Test Harness β Slack Narrative & Echo Suppression (Phase D.2)β
Goal: Verify Slack bot correctly streams narrative text, suppresses non-RAG post-tool echoes, and passes through RAG post-tool text.
Independent Test: pytest tests/test_slack_narrative_streaming.py -v β all tests pass.
- T038 [P] [US7] Create test file in
tests/test_slack_narrative_streaming.pyβ 11 tests covering event parsing, RAG exclusion, final text priority - T039 [US7] Test: EventType parsing for STREAMING_RESULT, FINAL_RESULT, TOOL_NOTIFICATION_START/END
- T040 [US7] Test: RAG tools (search, fetch_document, list_datasources, fetch_url) in RAG_TOOL_NAMES set
- T041 [US7] Test: non-RAG tools (github, argocd) NOT in RAG_TOOL_NAMES set
- T042 [US7] Test: _get_final_text priority β FINAL_RESULT > PARTIAL_RESULT > MESSAGE > artifacts
- T043 [US7] Test: empty sources fallback behavior in _get_final_text
Checkpoint: All Slack narrative streaming and echo suppression assertions pass.
Phase 7: Test Harness β Distributed Mode Binding (Phase D.4)β
Goal: Verify the distributed A2A path works correctly with mocked HTTP responses, without real agent containers.
Independent Test: pytest tests/test_distributed_mode_binding.py -v β all tests pass.
- T044 [P] [US5] Create test file in
tests/test_distributed_mode_binding.pyβ 9 tests for parsing edge cases and routing - T045 [US5] Test: all+other tokens β __all__, mixed-case "all", 200-name list parsing
- T046 [US5] Test: _agent_is_distributed routing for github with various distributed sets
- T047 [US5] Test: ENABLE_GITHUB=false drops agent before remote/local split (edge case)
- T048 [US5] Test:
DISTRIBUTED_AGENTS=argocdroutes argocd to remote, github to local; all vs none scenarios
Checkpoint: Distributed mode fully testable without Docker.
Phase 8: Test Harness β Integration Tests (Phase D.6)β
Goal: End-to-end streaming validation using Docker Compose test profile.
Independent Test: Start minimal Docker environment, run pytest integration/test_streaming_harness.py -v.
Prerequisites: Docker Compose with --profile github --profile netutils-agent
- T049 Create integration test file in
integration/test_streaming_harness.py - T050 Test: supervisor starts and responds to A2A health check (agent card validation)
- T051 Test: tool notifications contain correct agent names (not "task") β sourceAgent metadata assertion
- T052 Test: streaming events received via SSE in correct order β streaming_result before final_result
- T053 Test: final result artifact has non-empty content β length > 10 chars assertion
Checkpoint: End-to-end streaming validated through real Docker containers.
Phase 9: Polish & Cross-Cutting Concernsβ
Purpose: Finalize documentation, verify all tests pass together, clean up.
- T054 [P] Run full test suite: 193 tests passed, 0 failures
- T055 [P] Run linting: ruff check passed on all changed files (fixed 3 issues: unused imports, line length)
- T056 Update spec.md Implementation Progress table to reflect all completed work
- T057 Verify quickstart.md test commands β unit test suite verified working
Checkpoint: All quality gates pass. Ready for PR review.
Dependencies & Execution Orderβ
Phase Dependenciesβ
- Phase 1 (Completed): No dependencies β
- Phase 2 (US-7 fix): Depends on Phase 1 β BLOCKS Phase 4 (FINAL_RESULT tests)
- Phase 3 (Fixtures): No dependencies β can start immediately, in parallel with Phase 2
- Phase 4 (FINAL_RESULT tests): Depends on Phase 2 (tests validate the fix) + Phase 3 (fixtures)
- Phase 5 (Binding events): Depends on Phase 3 (fixtures) β can parallelize with Phase 4
- Phase 6 (Slack narrative): Depends on Phase 3 (fixtures) β can parallelize with Phase 4, 5
- Phase 7 (Distributed mode): Depends on Phase 3 (fixtures) β can parallelize with Phase 4, 5, 6
- Phase 8 (Integration): Depends on Phases 2, 4, 5, 6 β needs all unit tests passing first
- Phase 9 (Polish): Depends on all previous phases
User Story Dependenciesβ
- US-7 (Phase 2): One remaining task β fix
final_response['content']. Independent of other stories. - Test Harness (Phases 3-8): Validates US-1 through US-7. Independent test files, no cross-story blocking.
Within Each Test Phaseβ
- Create test file skeleton β add individual test cases β run and verify
- Fixtures must exist before tests that reference them
Parallel Opportunitiesβ
Phase 2 (US-7 fix) βββββββββββββββββββ
ββββ Phase 4 (FINAL_RESULT tests)
Phase 3 (Fixtures) ββββ¬βββ Phase 5 βββ€
ββββ Phase 6 βββ€
ββββ Phase 7 βββ€
ββββ Phase 8 (Integration)
ββββ Phase 9 (Polish)
- Phases 5, 6, 7 can ALL run in parallel (different test files, no shared state)
- Phase 3 (fixtures) can run in parallel with Phase 2 (US-7 fix)
- Within each phase, tasks marked [P] can run in parallel
Parallel Example: Test Harness Fixtures (Phase 3)β
# All fixture files can be created in parallel:
Task: "Create tests/fixtures/rag_search_response.json"
Task: "Create tests/fixtures/rag_fetch_document_response.json"
Task: "Create tests/fixtures/a2a_agent_card.json"
Task: "Create tests/fixtures/a2a_task_sse_stream.json"
Parallel Example: Test Files (Phases 5-7)β
# After fixtures exist, all test files can be worked on in parallel:
Task: "tests/test_binding_streaming_events.py" # Phase 5
Task: "tests/test_slack_narrative_streaming.py" # Phase 6
Task: "tests/test_distributed_mode_binding.py" # Phase 7
Implementation Strategyβ
MVP First (Phase 2 Only)β
- Fix
final_response['content']inagent.py(T021-T022) - STOP and VALIDATE: Test via Slack β final answer should now appear
- Deploy to dev environment for user validation
Incremental Deliveryβ
- Phase 2 β Fix final answer β Validate in Slack (critical bug fix)
- Phase 3 β Create fixtures β Foundation for all test work
- Phases 4-7 β Unit tests β Fast CI coverage (can parallelize)
- Phase 8 β Integration test β End-to-end confidence
- Phase 9 β Polish β PR ready
Parallel Team Strategyβ
With 3 developers after Phase 3 completes:
- Developer A: Phase 5 (binding events) + Phase 4 (FINAL_RESULT)
- Developer B: Phase 6 (Slack narrative)
- Developer C: Phase 7 (distributed mode)
- All converge: Phase 8 (integration) β Phase 9 (polish)
Notesβ
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Completed tasks (Phases 1-9) are marked
[x]β all 57 tasks done - Streaming conformance tasks (formerly Phase 10) moved to
.specify/specs/099-slack-streaming-conformance/tasks.md - All tests use
asyncio.run()wrappers (nopytest-asynciodependency) - Mock patterns follow existing
test_rag_tools_hard_stop.pyandslack_bot/tests/test_a2a_streaming.py