Tasks: Slack Streaming Conformance — Artifact ID Reset, No-Tool Queries, RAG Cap Isolation

Input: Spec from .specify/specs/099-slack-streaming-conformance/spec.md Related: 098-unify-single-distributed-binding (parent spec — binding unification, RAG caps)

Organization: Tasks migrated from 098 Phase 10, organized by user story. All tasks are completed.

Format: `[ID] [P?] [Story] Description`

[P]: Can run in parallel (different files, no dependencies)
[Story]: Which user story this task belongs to (e.g., US1, US4)
Include exact file paths in descriptions

Path Conventions

A2A binding: ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/
Slack bot: ai_platform_engineering/integrations/slack_bot/
RAG tools: ai_platform_engineering/multi_agents/platform_engineer/rag_tools.py
Middleware: ai_platform_engineering/utils/deepagents_custom/middleware.py
Prompts: charts/ai-platform-engineering/data/
Tests: tests/

Phase 1: Live Final-Answer Streaming & Metadata Propagation ✅

Goal: Restore word-by-word streaming of the final answer to Slack and propagate is_final_answer metadata through the executor.

T001 [US1] Restore live post-marker token yield with is_final_answer: True metadata in ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py (lines ~1054-1061)
T002 [US1] Propagate is_final_answer from event to artifact metadata in ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent_executor.py (lines ~779-781)

Checkpoint: STREAMING_RESULT events carry is_final_answer=True in metadata; executor propagates to artifacts.

Phase 2: Per-Tool RAG Cap Tracking ✅

Goal: Replace global RAG hard-stop with per-tool cap tracking to prevent premature graph termination.

T003 [US2] Replace global _rag_hard_stop_set with _rag_capped_tools: dict[str, set[str]] in ai_platform_engineering/multi_agents/platform_engineer/rag_tools.py
T004 [US2] Add is_rag_tool_capped(thread_id, tool_name) function in rag_tools.py
T005 [US2] Update _record_rag_cap_hit to accept tool_name argument in rag_tools.py
T006 [US2] Update DeterministicTaskMiddleware.after_model to use per-tool cap checks in ai_platform_engineering/utils/deepagents_custom/middleware.py
T007 [US2] Update tests/test_rag_tools_hard_stop.py for new cap tracking structure; add test_is_rag_tool_capped_tracks_individual_tools

Checkpoint: Capping search does not block fetch_document; graph only terminates when all requested tools are individually capped. SC-003, SC-004 satisfied.

Phase 3: No-Tool Query Stream Opening ✅

Goal: Ensure simple/off-topic queries (no tool calls) deliver answers to Slack.

T008 [US3] Fix pre-stream typing guard: if not stream_ts and not streaming_final_answer: in ai_platform_engineering/integrations/slack_bot/utils/ai.py (line ~401)

Checkpoint: "tell me a joke" and "how is the weather in SF?" produce visible Slack output. SC-002 satisfied.

Phase 4: RAG Parallel Tool Call Prompts ✅

Goal: Hint the LLM to issue multiple RAG tool calls per response for faster retrieval.

T009 [P] [US5] Add parallel tool call hint to search_tool_prompt in charts/ai-platform-engineering/data/prompt_config.rag.yaml
T010 [P] [US5] Add parallel tool call hint to RAG instructions in charts/ai-platform-engineering/data/prompt_config.deep_agent.yaml

Checkpoint: RAG prompts include "Parallel Tool Calls" hint section.

Phase 5: Conformance Test Suite ✅

Goal: Build an automated suite that validates all Slack streaming scenarios end-to-end.

T011 [US4] Add --suite mode to tests/simulate_slack_stream.py with run_suite() function
T012 [US4] Define 4 conformance scenarios: simple-chat, off-topic, rag-simple, rag-complex
T013 [US4] Implement conformance checks: content_delivered, stream_opened, live_streamed, no_duplicate, final_answer_latched, tools_used, multi_chunk, no_tools
T014 [US4] Modify simulate() to return structured result dict for suite consumption

Checkpoint: All 22 conformance checks pass. SC-001 satisfied.

Phase 6: Benchmark & Enforcement ✅

Goal: Define the formal benchmark and automate enforcement via a Cursor rule.

T015 [US4] Create tests/STREAMING_CONFORMANCE.md with scenarios, invariants, scope, and instructions for adding new scenarios (SC-006)
T016 [US4] Create .cursor/rules/streaming-conformance.mdc glob-matched to 7 pipeline files (SC-005)

Checkpoint: Benchmark document exists; Cursor rule triggers on pipeline edits.

Dependencies & Execution Order

Phase 1 (live streaming)  ──→ Phase 3 (no-tool fix) ──→ Phase 5 (conformance suite)
                                                         ↑
Phase 2 (RAG caps)  ─────────────────────────────────────┘
                                                         ↓
Phase 4 (prompts)  ──────────────────────────────────→ Phase 6 (benchmark)

Phases 1, 2, 4 can run in parallel (different files, no shared state)
Phase 3 depends on Phase 1 (is_final_answer metadata must exist)
Phase 5 depends on Phases 1, 2, 3 (validates all fixes)
Phase 6 depends on Phase 5 (benchmark references the suite)

Notes

All 16 tasks (T001–T016) are completed
All tasks use exact file paths for traceability
These tasks were originally Phase 10 / T058–T071 in the 098 spec
The conformance test suite requires a running supervisor: PYTHONPATH=. uv run python tests/simulate_slack_stream.py --suite

Format: [ID] [P?] [Story] Description​

Path Conventions​

Phase 1: Live Final-Answer Streaming & Metadata Propagation ✅​

Phase 2: Per-Tool RAG Cap Tracking ✅​

Phase 3: No-Tool Query Stream Opening ✅​

Phase 4: RAG Parallel Tool Call Prompts ✅​

Phase 5: Conformance Test Suite ✅​

Phase 6: Benchmark & Enforcement ✅​

Dependencies & Execution Order​

Notes​