Skip to main content

Ontology Agent

The Ontology Agent automatically discovers and validates relationships between entity types in the knowledge graph. Instead of manually defining schemas, the agent uses fuzzy matching and LLM evaluation to identify meaningful relationships.

For configuration, implementation details, and the full architecture, see the Ontology Agent README.

Why Automatic Ontology Discovery?​

When ingesting data from multiple sources (AWS, Kubernetes, Backstage, etc.), entities naturally have relationships:

  • A Pod runs on a Node
  • A Deployment manages ReplicaSets
  • An EC2 Instance belongs to a VPC
  • A Backstage Component depends on other Components

Manually defining these relationships doesn't scale. The Ontology Agent:

  • Discovers relationships automatically by analyzing property patterns
  • Validates with LLMs to ensure semantic correctness
  • Adapts to new data as entity types evolve
  • Runs continuously in the background

How It Works​

The agent uses a multi-stage pipeline to discover relationships:

1. Candidate Discovery​

The agent builds an in-memory search index of all entity types and their identity properties. Using BM25 fuzzy search, it finds potential matches between entity properties.

Example: A Pod has a spec.nodeName property. The agent searches for entities where identity keys match "node" patterns and finds Node entities.

Optimization: A Bloom filter pre-filters searches, eliminating 80-90% of non-matching queries before the BM25 search runs.

2. Deep Property Matching​

For each candidate relationship, the agent validates the match by comparing properties in detail:

Match TypeQuality ScoreExample
Exact1.0"web" matches "web"
Prefix0.8"web-pod" matches "web"
Suffix0.7"my-web" matches "web"
Contains0.85Array contains value

The agent computes a quality score combining BM25 relevance, match quality, and uniqueness.

3. LLM Evaluation​

Candidates that meet quality thresholds are sent to parallel LLM workers for evaluation. Each worker:

  • Reviews example entity pairs
  • Examines property mappings
  • Considers semantic meaning
  • Decides: Accept, Reject, or Unsure

Accept: The relationship is valid. The agent assigns a semantic name (e.g., RUNS_ON, MANAGES, BELONGS_TO).

Reject: The relationship is invalid despite matching properties (e.g., coincidental name overlap).

Unsure: Insufficient evidence. The relationship is revisited when more data is available.

4. Synchronization​

Accepted relationships are synced back to the data graph:

  • Relationship edges created between matching entities
  • Property mapping rules stored for future matching
  • Sync status tracked for monitoring

Diagram: Ontology Discovery Flow​

Automatic vs. Manual Trigger​

Automatic Mode​

The agent runs on a timer (default: every 6 hours):

  1. Discovers new relationship candidates
  2. Re-evaluates candidates where data has changed significantly
  3. Syncs accepted relationships to the data graph

Manual Trigger​

Trigger processing via API:

# Trigger full processing cycle
curl -X POST http://localhost:8098/v1/graph/ontology/agent/regenerate_ontology

# Check agent status
curl http://localhost:8098/v1/graph/ontology/agent/status

When Relationships Are Re-evaluated​

The agent tracks metrics for each relationship candidate. Re-evaluation triggers when:

  • Count changes significantly (default: 10% change in match count)
  • Quality score changes (new properties or improved matching)
  • Manual trigger via API

This ensures the ontology stays current without unnecessary LLM calls.

Storage Architecture​

The agent uses dual storage for optimal performance:

StoragePurposeData Stored
RedisHot metricsMatch counts, quality scores, recent examples
Neo4jStructureEntity schemas, evaluation results, relationships

Redis handles frequent updates during candidate discovery, while Neo4j stores the permanent schema structure and evaluation history.

Versioning​

Each processing run creates a new version (UUID). This enables:

  • Safe comparison between versions
  • Rollback if needed
  • Gradual schema evolution
  • Cleanup of old data

Further Reading​