Architecture: Automatic Fact Extraction via LangMem

Decision

Use LangMem's create_memory_store_manager API to automatically extract facts from every conversation turn, running as a background task after the agent responds.

Alternative	Pros	Cons	Decision
create_memory_store_manager (chosen)	Native BaseStore integration, automatic search/insert/update/delete, consolidates duplicates	Extra LLM call per turn	Selected
create_memory_manager + manual store writes	More control over extraction output	No store integration, must manually search/write/deduplicate	Rejected
Inline extraction during compression only	No extra LLM calls	Facts only captured when context window is near-full; short conversations produce no memories	Rejected (status quo)
Custom fact extraction prompt	Full control over prompt	Reinvents what LangMem already provides, no dedup/update logic	Rejected

Solution Architecture

Background Extraction Flow

User sends message; agent streams response back to user.
After the response stream completes, a background asyncio.create_task() launches fact extraction.
The MemoryStoreManager searches the store for existing memories relevant to this conversation.
It analyzes the conversation messages alongside existing memories using an LLM.
It generates insert/update/delete operations and applies them to the store.
On the next conversation (same or new thread), store_get_cross_thread_context retrieves these facts.

Memory Types Extracted

LangMem's memory manager extracts three categories:

Semantic: Facts, preferences, relationships (e.g., "User's team uses ArgoCD on prod-west cluster")
Episodic: Past experiences and conversation context (e.g., "User debugged an OOM issue in the monitoring namespace")
Procedural: Behavioral patterns (e.g., "User prefers concise responses with YAML examples")

Feature Flag

Extraction is controlled by ENABLE_FACT_EXTRACTION (default: false). This is disabled by default because:

It adds one LLM call per conversation turn (cost/latency consideration)
Teams should opt-in after evaluating cost vs. benefit
InMemoryStore (default) loses data on restart, so extraction is most useful with Redis/Postgres store backends

Configuration

ENABLE_FACT_EXTRACTION=false           # Enable/disable background fact extraction
FACT_EXTRACTION_MODEL=                 # Model for extraction (empty = use default LLM)

Components Changed

ai_platform_engineering/utils/agent_memory/fact_extraction.py (new) - Extraction logic, feature flag, MemoryStoreManager factory
ai_platform_engineering/multi_agents/platform_engineer/protocol_bindings/a2a/agent.py - Background task launch after stream()
ai_platform_engineering/multi_agents/agent_registry.py - Added FACT_EXTRACTION to DEFAULT_REGISTRY_EXCLUSIONS
.env.example / docker-compose.dev.yaml - New environment variables
cross-thread-store spec - Updated spec with Phase 4
tests/test_fact_extraction.py (new) - 65 unit tests covering feature flag, config, extraction, store compatibility, edge cases
tests/test_store.py (enhanced) - 86 unit tests covering store factory, operations, cross-thread context, integration
tests/test_checkpoint.py (new) - 49 unit tests covering checkpointer lifecycle, thread isolation, state round-trip, auto-trim, safe split, repair fallback, concurrent access, agent wiring
integration/test_fact_extraction_live.py (new) - Live integration test with recall verification and user isolation

Dependency: cnoe-agent-utils

The trace_agent_stream decorator in cnoe-agent-utils required a fix to forward **kwargs so that user_id can be propagated through the decorated stream() method. This fix is backward-compatible.

PR: https://github.com/cnoe-io/cnoe-agent-utils/pull/32

Spec: spec.md

Decision​

Solution Architecture​

Background Extraction Flow​

Memory Types Extracted​

Feature Flag​

Configuration​

Components Changed​

Dependency: cnoe-agent-utils​

Related​