Architecture
This page provides an overview of the CAIPE RAG system architecture, including core components, data flows, and technology decisions.
For implementation details and configuration, see the Architecture.md in the RAG codebase.
System Overview​
CAIPE RAG is composed of three main components that work together to ingest, process, and serve knowledge:
| Component | Port | Purpose |
|---|---|---|
| Server | 9446 | Core API for ingestion, hybrid search, graph exploration, and MCP tools |
| Ontology Agent | 8098 | Automated relationship discovery using LLM evaluation |
| Ingestors | - | External services that pull data from various sources |
Diagram: Component Architecture​
Data Flow​
Document Ingestion​
When documents are ingested, they flow through a processing pipeline that prepares them for both vector search and graph storage.
Flow:
- External Source → Ingestor fetches data (e.g., AWS API, Kubernetes API, web crawler)
- Ingestor → Server API (
POST /v1/ingest) with documents and metadata - Server → Processes documents:
- Text chunking with overlap for context preservation
- Dual embedding generation (dense + sparse vectors)
- Graph entity parsing and nested structure splitting
- Storage → Milvus (vectors) + Neo4j (graph entities) + Redis (metadata)
Key Processing Steps:
| Step | Description |
|---|---|
| Chunking | Large documents split on paragraph/sentence boundaries with overlap |
| Dense Embedding | Semantic vectors via OpenAI, Azure OpenAI, or other providers |
| Sparse Embedding | BM25 vectors for keyword matching (generated by Milvus) |
| Entity Splitting | Nested JSON structures split into connected sub-entities |
Diagram: Ingestion Pipeline​
Query and Hybrid Search​
Queries combine semantic and keyword search for comprehensive results.
Flow:
- User Query → Server API (
POST /v1/query) - Filter Application → Metadata filters narrow search scope
- Dual Search:
- Semantic search using dense vectors (cosine similarity)
- Keyword search using BM25 sparse vectors
- Weighted Reranking → Combine scores with configurable weights
- Results → Ranked documents with relevance scores
Search Strategies:
| Strategy | Semantic Weight | Keyword Weight | Best For |
|---|---|---|---|
| Balanced (default) | 50% | 50% | General queries |
| Semantic | 90% | 10% | Conceptual questions |
| Keyword | 10% | 90% | Exact term matching |
Ontology Discovery​
The Ontology Agent automatically discovers relationships between entity types. See Ontology Agent for conceptual details.
Flow:
- Data Graph → Ontology Agent reads entity types and properties
- Candidate Discovery → BM25 fuzzy search finds potential relationships
- Validation → Deep property matching validates candidates
- LLM Evaluation → Parallel workers evaluate relationship validity
- Sync → Accepted relationships written to data graph
Technology Stack​
Databases​
| Database | Purpose | Key Features |
|---|---|---|
| Milvus | Vector storage and hybrid search | HNSW index for dense vectors, inverted index for BM25 |
| Neo4j | Knowledge graph storage | Cypher queries, relationship traversal, APOC plugins |
| Redis | Metadata and caching | Job queues, datasource metadata, ontology metrics |
Backend​
| Technology | Purpose |
|---|---|
| Python 3.13+ | Primary language with UV package manager |
| FastAPI | REST API framework |
| LangChain | Document processing and LLM integration |
| LangGraph | Agent workflows for ontology discovery |
| FastMCP | Model Context Protocol server |
Embeddings Providers​
The system supports multiple embedding providers:
- Azure OpenAI
- OpenAI
- AWS Bedrock
- Cohere
- HuggingFace (local models)
- Ollama (local models)
Infrastructure​
| Component | Purpose |
|---|---|
| Docker / Docker Compose | Containerization and orchestration |
| MinIO | Object storage for Milvus |
| Etcd | Configuration management for Milvus |
Port Reference​
| Port | Service | Protocol |
|---|---|---|
| 9446 | Server REST API | HTTP |
| 9446 | Server MCP | HTTP (SSE) |
| 8098 | Ontology Agent | HTTP |
| 7687 | Neo4j | Bolt |
| 7474 | Neo4j Browser | HTTP |
| 19530 | Milvus | gRPC |
| 6379 | Redis | TCP |
Further Reading​
- Server Architecture - Detailed server internals
- Ontology Agent README - Relationship discovery details
- Server README - Configuration reference