Architecture: Automatic Fact Extraction via LangMem
Decision​
Use LangMem's create_memory_store_manager API to automatically extract facts from every conversation turn, running as a background task after the agent responds.
| Alternative | Pros | Cons | Decision |
|---|---|---|---|
| create_memory_store_manager (chosen) | Native BaseStore integration, automatic search/insert/update/delete, consolidates duplicates | Extra LLM call per turn | Selected |
| create_memory_manager + manual store writes | More control over extraction output | No store integration, must manually search/write/deduplicate | Rejected |
| Inline extraction during compression only | No extra LLM calls | Facts only captured when context window is near-full; short conversations produce no memories | Rejected (status quo) |
| Custom fact extraction prompt | Full control over prompt | Reinvents what LangMem already provides, no dedup/update logic | Rejected |
Solution Architecture​
Background Extraction Flow​
- User sends message; agent streams response back to user.
- After the response stream completes, a background
asyncio.create_task()launches fact extraction. - The
MemoryStoreManagersearches the store for existing memories relevant to this conversation. - It analyzes the conversation messages alongside existing memories using an LLM.
- It generates insert/update/delete operations and applies them to the store.
- On the next conversation (same or new thread),
store_get_cross_thread_contextretrieves these facts.
Memory Types Extracted​
LangMem's memory manager extracts three categories:
- Semantic: Facts, preferences, relationships (e.g., "User's team uses ArgoCD on prod-west cluster")
- Episodic: Past experiences and conversation context (e.g., "User debugged an OOM issue in the monitoring namespace")
- Procedural: Behavioral patterns (e.g., "User prefers concise responses with YAML examples")
Feature Flag​
Extraction is controlled by ENABLE_FACT_EXTRACTION (default: false). This is disabled by default because:
- It adds one LLM call per conversation turn (cost/latency consideration)
- Teams should opt-in after evaluating cost vs. benefit
- InMemoryStore (default) loses data on restart, so extraction is most useful with Redis/Postgres store backends
Configuration​
ENABLE_FACT_EXTRACTION=false # Enable/disable background fact extraction
FACT_EXTRACTION_MODEL= # Model for extraction (empty = use default LLM)
Components Changed​
ai_platform_engineering/utils/agent_memory/fact_extraction.py(new) - Extraction logic, feature flag, MemoryStoreManager factoryai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py- Background task launch after stream()ai_platform_engineering/multi_agents/agent_registry.py- AddedFACT_EXTRACTIONtoDEFAULT_REGISTRY_EXCLUSIONS.env.example/docker-compose.dev.yaml- New environment variables- cross-thread-store spec - Updated spec with Phase 4
tests/test_fact_extraction.py(new) - 65 unit tests covering feature flag, config, extraction, store compatibility, edge casestests/test_store.py(enhanced) - 86 unit tests covering store factory, operations, cross-thread context, integrationtests/test_checkpoint.py(new) - 49 unit tests covering checkpointer lifecycle, thread isolation, state round-trip, auto-trim, safe split, repair fallback, concurrent access, agent wiringintegration/test_fact_extraction_live.py(new) - Live integration test with recall verification and user isolation
Dependency: cnoe-agent-utils​
The trace_agent_stream decorator in cnoe-agent-utils required a fix to forward **kwargs so that user_id can be propagated through the decorated stream() method. This fix is backward-compatible.
Related​
- Spec: spec.md