RAG Ingestion Service

The RAG Ingestion Service provides a convenient way to populate your knowledge base with content from various sources. It includes both a web UI for interactive use and API endpoints for programmatic access.

🌐 Web UI Interface

The RAG system includes a user-friendly web interface for document ingestion and querying, accessible at http://localhost:9447 when running the KB-RAG services.

RAG Web UI

Features

📄 URL Ingestion: Import content from web pages and documentation sites
🔍 Interactive Querying: Test your knowledge base with real-time search
⚙️ Configurable Parameters: Adjust search limits and similarity thresholds
📊 Real-time Status: Monitor ingestion progress and results

🚀 Getting Started with Web UI

1. Access the Interface

Once your KB-RAG services are running, navigate to:

http://localhost:9447

2. Ingest Content

Step 1: In the "Ingest URL" section, enter the URL of the content you want to add:

https://docs.example.com

Step 2: Click the "Ingest" button to start the process

Step 3: Monitor the status - you should see confirmation when processing is complete

3. Query Your Knowledge Base

Step 1: In the "Query" section, enter your question:

How do I configure authentication?

Step 2: Adjust parameters if needed:

Limit: Number of results to return (default: 3)
Similarity: Minimum similarity threshold (default: 0.7)

Step 3: Click "Search" to retrieve relevant information

🔧 How Ingestion Works

The ingestion process follows these steps:

RAG Ingestion Process

Processing Steps

Content Crawling: Fetches content from the provided URL
HTML Parsing: Extracts meaningful text from HTML structure
Text Chunking: Splits large documents using Recursive Text Splitter
Embedding Generation: Converts text chunks to vector embeddings using LLM
Storage: Saves embeddings and metadata to Milvus vector database

📡 API Endpoints

For programmatic access, the ingestion service provides REST APIs:

Ingest Content from URL

curl -X POST "http://localhost:9447/v1/datasource/ingest/url" \
     -H "Content-Type: application/json" \
     -d '{
       "url": "https://docs.example.com",
       "params": {}
     }'

Response:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "pending",
  "message": "Ingestion job started"
}

Check Ingestion Status

curl -X GET "http://localhost:9447/v1/datasource/ingest/status/{job_id}"

Response:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "completed",
  "progress": {
    "message": "Ingestion completed",
    "processed": 50,
    "total": 50
  },
  "created_at": "2024-01-01T10:00:00",
  "completed_at": "2024-01-01T10:02:30",
  "error": null
}

Query Knowledge Base

curl -X POST "http://localhost:9447/v1/query" \
     -H "Content-Type: application/json" \
     -d '{
       "query": "How do I deploy applications?",
       "limit": 5,
       "similarity_threshold": 0.7
     }'

Response:

{
  "query": "How do I deploy applications?",
  "results": [
    {
      "page_content": "To deploy applications...",
      "metadata": {
        "source": "https://docs.example.com/deploy",
        "title": "Deployment Guide"
      }
    }
  ]
}

Clear All Data

curl -X POST "http://localhost:9447/v1/datasource/clear_all"

Health Check

curl -X GET "http://localhost:9447/healthz"

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-01T10:00:00"
}

⚙️ Configuration Options

Collection Management

The ingestion service uses collections to organize different knowledge bases:

Default Collection: rag_default (configurable via DEFAULT_VSTORE_COLLECTION environment variable)
Collection Format: All documents are stored in the configured collection
Environment Configuration: Set DEFAULT_VSTORE_COLLECTION=your_collection_name to use a custom collection

Chunking Parameters

Customize how documents are split:

chunk_size: 1000          # Characters per chunk
chunk_overlap: 200        # Overlap between chunks
separators: ["\n\n", "\n", " ", ""]  # Split hierarchy

Embedding Configuration

Vector embedding settings:

embedding_model: "text-embedding-ada-002"  # Model for embeddings
dimension: 1536                            # Embedding dimensions

🛠️ Advanced Usage

Batch Ingestion

For multiple URLs, you can script the ingestion using curl:

#!/bin/bash

urls=(
    "https://docs.example.com/guide1"
    "https://docs.example.com/guide2" 
    "https://docs.example.com/api-reference"
)

for url in "${urls[@]}"; do
    echo "Ingesting: $url"
    job_id=$(curl -s -X POST "http://localhost:9447/v1/datasource/ingest/url" \
                  -H "Content-Type: application/json" \
                  -d "{\"url\": \"$url\", \"params\": {}}" | \
             jq -r '.job_id')
    echo "Job ID: $job_id"
    
    # Wait for completion (optional)
    while true; do
        status=$(curl -s "http://localhost:9447/v1/datasource/ingest/status/$job_id" | jq -r '.status')
        if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
            echo "Status: $status"
            break
        fi
        echo "Status: $status - waiting..."
        sleep 5
    done
done

Job Status Monitoring

Monitor ingestion progress programmatically:

# Start ingestion and capture job ID
job_id=$(curl -s -X POST "http://localhost:9447/v1/datasource/ingest/url" \
              -H "Content-Type: application/json" \
              -d '{"url": "https://docs.example.com", "params": {}}' | \
         jq -r '.job_id')

echo "Job started with ID: $job_id"

# Poll for status
while true; do
    response=$(curl -s "http://localhost:9447/v1/datasource/ingest/status/$job_id")
    status=$(echo "$response" | jq -r '.status')
    progress=$(echo "$response" | jq -r '.progress.message')
    
    echo "Status: $status - $progress"
    
    if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
        echo "Final response:"
        echo "$response" | jq .
        break
    fi
    
    sleep 2
done

🔍 Troubleshooting

Common Issues

Ingestion Fails

Verify URL is accessible
Check for authentication requirements
Ensure sufficient disk space

No Search Results

Lower similarity threshold
Try different query phrasing
Verify content was ingested successfully

Slow Performance

Check Milvus database health
Monitor system resources
Consider chunking parameters

Logs and Monitoring

Monitor ingestion progress through Docker logs:

docker logs kb-rag-server

Check RAG service health:

curl http://localhost:9447/healthz

Check Milvus database status:

curl http://localhost:9091/webui

🎯 Best Practices

Content Selection

Choose high-quality, well-structured documentation
Avoid duplicate content across URLs
Ensure content is up-to-date and accurate

Query Optimization

Use specific, contextual questions
Include relevant keywords from your domain
Test different similarity thresholds for your use case

Data Management

Use descriptive environment variable names for different deployments
Clear outdated content using the /v1/datasource/clear_all endpoint
Monitor ingestion job status to ensure successful processing
Organize different content types in separate service instances if needed

🔗 Integration Examples

With Platform Engineer Agent

The ingested content automatically becomes available to the platform engineer agent:

# Agent will use RAG for documentation queries
"How do I configure SSL certificates?"

With Custom Applications

# Query the RAG service directly from any application
query_response=$(curl -s -X POST "http://localhost:9447/v1/query" \
                      -H "Content-Type: application/json" \
                      -d '{
                        "query": "deployment process",
                        "limit": 5,
                        "similarity_threshold": 0.7
                      }')

echo "$query_response" | jq '.results[] | {content: .page_content, source: .metadata.source}'

For Python integration:

import requests

def query_rag(query: str, limit: int = 3, threshold: float = 0.7):
    response = requests.post(
        "http://localhost:9447/v1/query",
        json={
            "query": query,
            "limit": limit,
            "similarity_threshold": threshold
        }
    )
    return response.json()

# Usage
results = query_rag("How to deploy applications?")
for doc in results["results"]:
    print(f"Content: {doc['page_content']}")
    print(f"Source: {doc['metadata']['source']}")

RAG Agent Overview - Main RAG agent documentation
Graph RAG - Knowledge graph-based RAG
Platform Engineer - Main agent system

🌐 Web UI Interface​

Features​

🚀 Getting Started with Web UI​

1. Access the Interface​

2. Ingest Content​

3. Query Your Knowledge Base​

🔧 How Ingestion Works​

Processing Steps​

📡 API Endpoints​

Ingest Content from URL​

Check Ingestion Status​

Query Knowledge Base​

Clear All Data​

Health Check​

⚙️ Configuration Options​

Collection Management​

Chunking Parameters​

Embedding Configuration​

🛠️ Advanced Usage​

Batch Ingestion​

Job Status Monitoring​

🔍 Troubleshooting​

Common Issues​

Logs and Monitoring​

🎯 Best Practices​

Content Selection​

Query Optimization​

Data Management​

🔗 Integration Examples​

With Platform Engineer Agent​

With Custom Applications​

📚 Related Documentation​

🌐 Web UI Interface

Features

🚀 Getting Started with Web UI

1. Access the Interface

2. Ingest Content

3. Query Your Knowledge Base

🔧 How Ingestion Works

Processing Steps

📡 API Endpoints

Ingest Content from URL

Check Ingestion Status

Query Knowledge Base

Clear All Data

Health Check

⚙️ Configuration Options

Collection Management

Chunking Parameters

Embedding Configuration

🛠️ Advanced Usage

Batch Ingestion

Job Status Monitoring

🔍 Troubleshooting

Common Issues

Logs and Monitoring

🎯 Best Practices

Content Selection

Query Optimization

Data Management

🔗 Integration Examples

With Platform Engineer Agent

With Custom Applications

📚 Related Documentation