Agent Refactoring: Unified BaseLangGraphAgent Implementation

Status: 🟢 In-use Category: Refactoring & Implementation Date: October 21, 2024

Overview

Refactored 8 agents to use the common BaseLangGraphAgent base class, eliminating code duplication and ensuring consistent behavior across all agents.

Agents Refactored

Agent	Status	Lines Removed	Lines Added	Reduction
ArgoCD	✅ Complete	~190	~108	43%
GitHub	✅ Complete	~2100	~108	95%
Slack	✅ Complete	~250	~92	63%
Jira	✅ Complete	~200	~91	54%
Backstage	✅ Complete	~180	~89	51%
Confluence	✅ Complete	~180	~86	52%
PagerDuty	✅ Complete	~180	~88	51%
Splunk	✅ Complete	~180	~88	51%
Komodor	✅ Already using	N/A	N/A	N/A
TOTAL	✅ Complete	~3,460	~750	78%

Benefits

1. Automatic Tool Visibility 🔧

All refactored agents now automatically show:

🔧 Calling tool: **list_clusters**
✅ Tool **list_clusters** completed
🔧 Calling tool: **get_cluster_details**
✅ Tool **get_cluster_details** completed

Before refactoring: No tool visibility, just "Processing..."

2. Consistent Structure 📐

All agents now follow the exact same pattern:

class AgentName(BaseLangGraphAgent):
    """Agent description."""

    SYSTEM_INSTRUCTION = "..."  # Agent-specific prompt
    RESPONSE_FORMAT_INSTRUCTION = "..."  # Standard format

    def get_agent_name(self) -> str:
        return "agent_name"

    def get_system_instruction(self) -> str:
        return self.SYSTEM_INSTRUCTION

    def get_response_format_instruction(self) -> str:
        return self.RESPONSE_FORMAT_INSTRUCTION

    def get_response_format_class(self) -> type[BaseModel]:
        return ResponseFormat

    def get_mcp_config(self, server_path: str) -> dict:
        # Agent-specific MCP configuration
        return {...}

    def get_tool_working_message(self) -> str:
        return 'Querying Agent...'

    def get_tool_processing_message(self) -> str:
        return 'Processing Agent data...'

    @trace_agent_stream("agent_name")
    async def stream(self, query: str, sessionId: str, trace_id: str = None):
        async for event in super().stream(query, sessionId, trace_id):
            yield event

Only 3 things differ:

System instruction (prompt)
MCP configuration (env vars, tools)
Agent name

3. Reduced Code Duplication 📉

3,460 lines removed across all agents
750 lines added (clean, consistent implementations)
78% code reduction overall
2,710 net lines deleted

4. Easier Maintenance 🛠️

Before:

Bug fix needs to be applied to 8 different files
Each agent has slightly different implementation
Inconsistent error handling

After:

Bug fix in BaseLangGraphAgent fixes all 8 agents
All agents behave identically
Consistent error handling and streaming

5. Future Enhancements Automatic 🚀

Any improvements to BaseLangGraphAgent automatically apply to all agents:

✅ Tool visibility (already added!)
✅ Better error handling
✅ Performance optimizations
✅ New A2A protocol features

File Changes

Modified Files

ai_platform_engineering/agents/
├── argocd/agent_argocd/protocol_bindings/a2a_server/agent.py
├── github/agent_github/protocol_bindings/a2a_server/agent.py
├── slack/agent_slack/protocol_bindings/a2a_server/agent.py
├── jira/agent_jira/protocol_bindings/a2a_server/agent.py
├── backstage/agent_backstage/protocol_bindings/a2a_server/agent.py
├── confluence/agent_confluence/protocol_bindings/a2a_server/agent.py
├── pagerduty/agent_pagerduty/protocol_bindings/a2a_server/agent.py
├── splunk/agent_splunk/protocol_bindings/a2a_server/agent.py
└── komodor/agent_komodor/protocol_bindings/a2a_server/agent.py (fixed import)

Enhanced Base Classes

ai_platform_engineering/utils/a2a_common/
├── base_langgraph_agent.py (added tool visibility)
└── base_langgraph_agent_executor.py (added tool metadata logging)

Implementation Details

Pattern Example: ArgoCD Agent

Before (190 lines with complex initialization):

class ArgoCDAgent:
    def __init__(self):
        self.model = LLMFactory().get_llm()
        self.graph = None
        self.tracing = TracingManager()
        self._initialized = False

        async def _async_argocd_agent(state, config):
            # 150+ lines of setup code
            ...

        self._async_argocd_agent = _async_argocd_agent

    async def _initialize_agent(self):
        # Complex initialization logic
        ...

    async def stream(self, query, context_id, trace_id):
        await self._initialize_agent()
        # Custom streaming logic
        ...

After (108 lines, clean and simple):

class ArgoCDAgent(BaseLangGraphAgent):
    SYSTEM_INSTRUCTION = "..."
    RESPONSE_FORMAT_INSTRUCTION = "..."

    def get_agent_name(self) -> str:
        return "argocd"

    def get_mcp_config(self, server_path: str) -> dict:
        return {
            "command": "uv",
            "args": [...],
            "env": {...},
            "transport": "stdio",
        }

    # All streaming, initialization, and tool handling
    # is inherited from BaseLangGraphAgent!

Testing

All agents can be tested with the same pattern:

# Test any agent
curl -X POST http://localhost:8001 \
  -H "Content-Type: application/json" \
  -d '{"query": "list resources"}'

# Check logs for tool visibility
docker logs agent-argocd-p2p 2>&1 | grep -E "(Tool call detected|Tool result)" | tail -5

Expected output:

argocd: Tool call detected - list_applications
argocd: Tool result received - list_applications (success)

Backward Compatibility

✅ Fully backward compatible

Agent APIs unchanged
Environment variables unchanged
Response formats unchanged
A2A protocol unchanged

Migration Verification

Check All Agents Compile

cd /home/sraradhy/ai-platform-engineering
for agent in argocd github slack jira backstage confluence pagerduty splunk; do
  echo "=== Checking $agent ==="
  python3 -c "from ai_platform_engineering.agents.$agent.agent_$agent.protocol_bindings.a2a_server.agent import *" 2>&1 | grep -i error || echo "✅ $agent OK"
done

Restart All Agents

docker compose -f docker-compose.dev.yaml --profile p2p restart \
  agent-argocd-p2p \
  agent-github-p2p \
  agent-slack-p2p \
  agent-jira-p2p \
  agent-backstage-p2p \
  agent-confluence-p2p \
  agent-pagerduty-p2p \
  agent-splunk-p2p \
  agent-komodor-p2p

Verify Tool Visibility

# Query an agent
curl -X POST http://localhost:8000 -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":"test","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"show me komodor clusters"}]}}}'

# Check logs
docker logs agent-komodor-p2p 2>&1 | grep "🔧 Calling tool" | tail -3

A2A Intermediate States - Tool visibility implementation
Enhanced Streaming Feature - Parallel streaming
Streaming Architecture - Technical deep dive

Impact Summary

Code Quality

✅ 78% code reduction (2,710 net lines removed)
✅ Eliminated duplication across 8 agents
✅ Consistent patterns for all agents

User Experience

✅ Tool visibility - users see what agents are doing
✅ Better progress updates - real-time feedback
✅ Consistent behavior - all agents work the same way

Developer Experience

✅ Easier maintenance - fix once, applies to all
✅ Faster development - copy template, change 3 things
✅ Better debugging - tool calls logged automatically

Operations

✅ Easier monitoring - consistent logs across agents
✅ Better observability - tool execution traces
✅ Simpler deployment - all agents work the same

Conclusion

This refactoring represents a major improvement to the agent infrastructure:

🎯 Consistency: All agents follow the same pattern
🔧 Visibility: Users see tool execution in real-time
📉 Simplicity: 78% less code to maintain
🚀 Scalability: Future agents take 5 minutes to create

Status: ✅ COMPLETE AND READY FOR PRODUCTION

All agents have been refactored, tested, and are ready for deployment!

Overview​

Agents Refactored​

Benefits​

1. Automatic Tool Visibility 🔧​

2. Consistent Structure 📐​

3. Reduced Code Duplication 📉​

4. Easier Maintenance 🛠️​

5. Future Enhancements Automatic 🚀​

File Changes​

Modified Files​

Enhanced Base Classes​

Implementation Details​

Pattern Example: ArgoCD Agent​

Testing​

Backward Compatibility​

Migration Verification​

Check All Agents Compile​

Restart All Agents​

Verify Tool Visibility​

Related Documentation​

Impact Summary​

Code Quality​

User Experience​

Developer Experience​

Operations​

Conclusion​