Skip to main content

Architecture

This page provides an overview of the CAIPE RAG system architecture, including core components, data flows, and technology decisions.

For implementation details and configuration, see the Architecture.md in the RAG codebase.

System Overview​

CAIPE RAG is composed of three main components that work together to ingest, process, and serve knowledge:

ComponentPortPurpose
Server9446Core API for ingestion, hybrid search, graph exploration, and MCP tools
Ontology Agent8098Automated relationship discovery using LLM evaluation
Ingestors-External services that pull data from various sources

Diagram: Component Architecture​

Data Flow​

Document Ingestion​

When documents are ingested, they flow through a processing pipeline that prepares them for both vector search and graph storage.

Flow:

  1. External Source → Ingestor fetches data (e.g., AWS API, Kubernetes API, web crawler)
  2. Ingestor → Server API (POST /v1/ingest) with documents and metadata
  3. Server → Processes documents:
    • Text chunking with overlap for context preservation
    • Dual embedding generation (dense + sparse vectors)
    • Graph entity parsing and nested structure splitting
  4. Storage → Milvus (vectors) + Neo4j (graph entities) + Redis (metadata)

Key Processing Steps:

StepDescription
ChunkingLarge documents split on paragraph/sentence boundaries with overlap
Dense EmbeddingSemantic vectors via OpenAI, Azure OpenAI, or other providers
Sparse EmbeddingBM25 vectors for keyword matching (generated by Milvus)
Entity SplittingNested JSON structures split into connected sub-entities

Diagram: Ingestion Pipeline​

Queries combine semantic and keyword search for comprehensive results.

Flow:

  1. User Query → Server API (POST /v1/query)
  2. Filter Application → Metadata filters narrow search scope
  3. Dual Search:
    • Semantic search using dense vectors (cosine similarity)
    • Keyword search using BM25 sparse vectors
  4. Weighted Reranking → Combine scores with configurable weights
  5. Results → Ranked documents with relevance scores

Search Strategies:

StrategySemantic WeightKeyword WeightBest For
Balanced (default)50%50%General queries
Semantic90%10%Conceptual questions
Keyword10%90%Exact term matching

Ontology Discovery​

The Ontology Agent automatically discovers relationships between entity types. See Ontology Agent for conceptual details.

Flow:

  1. Data Graph → Ontology Agent reads entity types and properties
  2. Candidate Discovery → BM25 fuzzy search finds potential relationships
  3. Validation → Deep property matching validates candidates
  4. LLM Evaluation → Parallel workers evaluate relationship validity
  5. Sync → Accepted relationships written to data graph

Technology Stack​

Databases​

DatabasePurposeKey Features
MilvusVector storage and hybrid searchHNSW index for dense vectors, inverted index for BM25
Neo4jKnowledge graph storageCypher queries, relationship traversal, APOC plugins
RedisMetadata and cachingJob queues, datasource metadata, ontology metrics

Backend​

TechnologyPurpose
Python 3.13+Primary language with UV package manager
FastAPIREST API framework
LangChainDocument processing and LLM integration
LangGraphAgent workflows for ontology discovery
FastMCPModel Context Protocol server

Embeddings Providers​

The system supports multiple embedding providers:

  • Azure OpenAI
  • OpenAI
  • AWS Bedrock
  • Cohere
  • HuggingFace (local models)
  • Ollama (local models)

Infrastructure​

ComponentPurpose
Docker / Docker ComposeContainerization and orchestration
MinIOObject storage for Milvus
EtcdConfiguration management for Milvus

Port Reference​

PortServiceProtocol
9446Server REST APIHTTP
9446Server MCPHTTP (SSE)
8098Ontology AgentHTTP
7687Neo4jBolt
7474Neo4j BrowserHTTP
19530MilvusgRPC
6379RedisTCP

Further Reading​