Splunk Agent
- 🤖 Splunk Agent is an LLM-powered agent built using the LangGraph ReAct Agent workflow and Splunk MCP Server.
- 🌐 Protocol Support: Compatible with A2A protocol for integration with external user clients.
- 🛡️ Secure by Design: Enforces Splunk API token-based RBAC and supports secondary external authentication for strong access control.
- 🏭 MCP Server: The MCP server is generated by our first-party openapi-mcp-codegen utility, ensuring version/API compatibility and software supply chain integrity.
- 🔌 MCP Tools: Uses langchain-mcp-adapters to glue the tools from Splunk MCP server to LangGraph ReAct Agent Graph.
🏗️ Architecture
Detailed Sequence Diagram with Agentgateway
System Diagram
Sequence Diagram
⚙️ Local Development Setup
Use this setup to test the agent against a Splunk instance.
🔑 Get Splunk API Token
- Log in to your Splunk instance
- Go to Settings → Data Inputs → HTTP Event Collector
- Create a new token with appropriate permissions
- Save the token for your
.envfile
Add to your .env:
SPLUNK_TOKEN=<your_token>
SPLUNK_API_URL=https://your-splunk-instance.com/api
SPLUNK_VERIFY_SSL=true
Local Development
# Navigate to the Splunk agent directory
cd ai_platform_engineering/agents/splunk
# Run the MCP server in stdio mode
make run-a2a
✨ Features
- Log Search & Analytics: Search logs, run queries, and analyze data
- Alert Management: Create, update, and manage alerts and detectors
- Incident Management: Handle incidents and track their status
- Team Management: Manage teams and team members
- System Monitoring: Monitor system health and performance metrics
- Data Ingestion: Manage data sources and ingestion pipelines
- API Integration: Full Splunk API coverage through MCP tools
🎯 Example Use Cases
Ask the agent natural language questions like:
Log Analysis
- Error Investigation: "Search for error logs in the last 24 hours from the web application"
- Performance Analysis: "Show me the top 10 slowest API calls from yesterday"
- Security Monitoring: "Find all failed login attempts in the last hour"
Alert Management
- Alert Creation: "Create an alert for when CPU usage exceeds 80% for more than 5 minutes"
- Alert Monitoring: "Show me all active alerts and their current status"
- Alert Configuration: "Update the threshold for the database connection alert"
System Health
- Health Check: "Show me the current system health and any active alerts"
- Performance Metrics: "Display the average response time for the last 7 days"
- Resource Usage: "What's the current memory and CPU utilization?"
Incident Response
- Incident Management: "List all open incidents and their current status"
- Incident Investigation: "Help me investigate the cause of the recent service outage"
- Incident Resolution: "Update the status of incident INC-123 to resolved"
Data Management
- Data Sources: "List all configured data sources and their status"
- Data Ingestion: "Check the health of the log ingestion pipeline"
- Data Retention: "Show me the data retention policies for different log types"