Architecture

This page provides an overview of the CAIPE RAG system architecture, including core components, data flows, and technology decisions.

For implementation details and configuration, see the Architecture.md in the RAG codebase.

System Overview

CAIPE RAG is composed of three main components that work together to ingest, process, and serve knowledge:

Component	Port	Purpose
Server	9446	Core API for ingestion, hybrid search, graph exploration, and MCP tools
Ontology Agent	8098	Automated relationship discovery using LLM evaluation
Ingestors	-	External services that pull data from various sources

Diagram: Component Architecture

Data Flow

Document Ingestion

When documents are ingested, they flow through a processing pipeline that prepares them for both vector search and graph storage.

Flow:

External Source → Ingestor fetches data (e.g., AWS API, Kubernetes API, web crawler)
Ingestor → Server API (POST /v1/ingest) with documents and metadata
Server → Processes documents:
- Text chunking with overlap for context preservation
- Dual embedding generation (dense + sparse vectors)
- Graph entity parsing and nested structure splitting
Storage → Milvus (vectors) + Neo4j (graph entities) + Redis (metadata)

Key Processing Steps:

Step	Description
Chunking	Large documents split on paragraph/sentence boundaries with overlap
Dense Embedding	Semantic vectors via OpenAI, Azure OpenAI, or other providers
Sparse Embedding	BM25 vectors for keyword matching (generated by Milvus)
Entity Splitting	Nested JSON structures split into connected sub-entities

Diagram: Ingestion Pipeline

Query and Hybrid Search

Queries combine semantic and keyword search for comprehensive results.

Flow:

User Query → Server API (POST /v1/query)
Filter Application → Metadata filters narrow search scope
Dual Search:
- Semantic search using dense vectors (cosine similarity)
- Keyword search using BM25 sparse vectors
Weighted Reranking → Combine scores with configurable weights
Results → Ranked documents with relevance scores

Search Strategies:

Strategy	Semantic Weight	Keyword Weight	Best For
Balanced (default)	50%	50%	General queries
Semantic	90%	10%	Conceptual questions
Keyword	10%	90%	Exact term matching

Ontology Discovery

The Ontology Agent automatically discovers relationships between entity types. See Ontology Agent for conceptual details.

Flow:

Data Graph → Ontology Agent reads entity types and properties
Candidate Discovery → BM25 fuzzy search finds potential relationships
Validation → Deep property matching validates candidates
LLM Evaluation → Parallel workers evaluate relationship validity
Sync → Accepted relationships written to data graph

Technology Stack

Databases

Database	Purpose	Key Features
Milvus	Vector storage and hybrid search	HNSW index for dense vectors, inverted index for BM25
Neo4j	Knowledge graph storage	Cypher queries, relationship traversal, APOC plugins
Redis	Metadata and caching	Job queues, datasource metadata, ontology metrics

Backend

Technology	Purpose
Python 3.13+	Primary language with UV package manager
FastAPI	REST API framework
LangChain	Document processing and LLM integration
LangGraph	Agent workflows for ontology discovery
FastMCP	Model Context Protocol server

Embeddings Providers

The system supports multiple embedding providers:

Azure OpenAI
OpenAI
AWS Bedrock
Cohere
HuggingFace (local models)
Ollama (local models)

Infrastructure

Component	Purpose
Docker / Docker Compose	Containerization and orchestration
MinIO	Object storage for Milvus
Etcd	Configuration management for Milvus

Port Reference

Port	Service	Protocol
9446	Server REST API	HTTP
9446	Server MCP	HTTP (SSE)
8098	Ontology Agent	HTTP
7687	Neo4j	Bolt
7474	Neo4j Browser	HTTP
19530	Milvus	gRPC
6379	Redis	TCP

System Overview​

Diagram: Component Architecture​

Data Flow​

Document Ingestion​

Diagram: Ingestion Pipeline​

Query and Hybrid Search​

Ontology Discovery​

Technology Stack​

Databases​

Backend​

Embeddings Providers​

Infrastructure​

Port Reference​

Further Reading​

System Overview

Diagram: Component Architecture

Data Flow

Document Ingestion

Diagram: Ingestion Pipeline

Query and Hybrid Search

Ontology Discovery

Technology Stack

Databases

Backend

Embeddings Providers

Infrastructure

Port Reference

Further Reading