ADR-005: Enhanced OpenAPI Conversion for Swagger 2.0 Specifications
Status
Proposed - 2024-11-09
Context
The current enhance_and_generate.py script provides basic Swagger 2.0 to OpenAPI 3.x conversion capabilities, but encounters significant issues when processing complex real-world specifications like the Argo Workflows API.
Current Limitations
Based on analysis of argo-openapi.json (771KB, 255 operations), we identified:
- Incomplete Swagger 2.0 Conversion: Only handles basic fields (
swagger,basePath,schemes,host) - Invalid Body Parameters: 29 parameters with
"in": "body"are removed but not converted to properrequestBodysections - Schema Validation Issues: Parameters lack proper
schemaorcontentfields required by OpenAPI 3.x - Limited Error Handling: Insufficient validation and recovery mechanisms
- Manual Fixes Required: Significant post-conversion cleanup needed
Business Impact
- Development Overhead: Manual fixes required for each Swagger 2.0 specification
- Error-Prone Process: Inconsistent conversion results across different APIs
- Limited Legacy Support: Many enterprise APIs still use Swagger 2.0 format
- Reduced Reliability: MCP code generation fails on improperly converted specs
Decision
We will implement a hybrid approach combining rule-based conversion with iterative LLM-powered intelligence to provide comprehensive, automated Swagger 2.0 to OpenAPI 3.x conversion.
Core Improvements
1. Complete Swagger 2.0 Schema Conversion
# Convert all Swagger 2.0 constructs:
definitions -> components/schemas
parameters -> components/parameters
responses -> components/responses
securityDefinitions -> components/securitySchemes
# Update $ref paths: #/definitions/* -> #/components/schemas/*
2. Intelligent Body Parameter Conversion
# Transform body parameters to requestBody:
- Extract schema from body parameters
- Create proper requestBody with media types
- Preserve descriptions and examples
- Handle multiple content types (application/json, etc.)
3. Enhanced Parameter Schema Fixing
# Improved type inference:
- Pattern-based detection (timestamps, IDs, counts)
- OpenAPI convention compliance
- Array parameter handling with items
- Complex object parameter support
- Parameter reference resolution
4. LLM-Powered Iterative Conversion
# Intelligent conversion using existing LLM infrastructure:
- Leverage existing LLMFactory and prompt.yaml system
- Context-aware parameter type inference
- Smart schema transformation with validation
- Self-correcting through iterative refinement
- Preserve semantic meaning during conversion
5. Comprehensive Validation Framework
# Multi-stage validation:
- Pre-conversion spec validation
- Post-conversion OpenAPI 3.x compliance
- Parameter schema completeness check
- Reference integrity verification
- Rollback on conversion failure
LLM-Powered Iterative Approach
Leveraging Existing Infrastructure
The system already has robust LLM integration through:
- LLMFactory from cnoe_agent_utils: Multi-provider support (OpenAI, Anthropic, Google)
- Declarative Prompts: Sophisticated
prompt.yamlconfiguration system - LangChain Integration: Production-ready LLM pipeline with error handling
Iterative Conversion Pipeline
LLM Conversion Prompts
1. Body Parameter Conversion
# Addition to prompt.yaml
swagger_body_parameter_conversion:
system_prompt: |
You are an expert at converting Swagger 2.0 body parameters to OpenAPI 3.x requestBody format.
CONVERSION RULES:
1. Extract schema from body parameter
2. Create requestBody with proper media types
3. Preserve descriptions, examples, and constraints
4. Handle array/object schemas correctly
5. Use application/json as default media type
INPUT: Swagger 2.0 body parameter object
OUTPUT: Valid OpenAPI 3.x requestBody object (JSON format)
user_prompt_template: |
Convert this Swagger 2.0 body parameter to OpenAPI 3.x requestBody:
Parameter: {body_parameter}
Operation: {method} {path}
Return only valid JSON requestBody object.
2. Schema Type Inference
schema_type_inference:
system_prompt: |
You are an expert at inferring parameter types from context and naming conventions.
INFERENCE RULES:
- Analyze parameter name patterns
- Consider operation context
- Use OpenAPI 3.x type system
- Provide format specifiers when appropriate
- Handle arrays with proper items definitions
user_prompt_template: |
Infer schema for parameter: {param_name}
Location: {param_in}
Operation: {operation_context}
Original type: {original_type}
Generate OpenAPI 3.x schema object (JSON).
Iterative Refinement Process
- Initial Conversion: Rule-based + LLM analysis
- Validation Pass: Check OpenAPI 3.x compliance
- Error Analysis: LLM identifies specific issues
- Targeted Fixes: LLM generates precise corrections
- Re-validation: Verify fixes maintain spec integrity
- Iteration Limit: Maximum 3 refinement cycles
Configuration Integration
# config.yaml enhancement
conversion:
enabled: true
use_llm: true
llm_provider: "openai" # or anthropic, google
max_iterations: 3
batch_size: 10
validation_strict: true
preserve_extensions: true
# LLM-specific settings
llm_settings:
temperature: 0.1 # Low for consistency
max_tokens: 2000
timeout: 30
retry_attempts: 2
Implementation Plan
Phase 1: LLM-Powered Conversion Foundation
- Extend
prompt.yamlwith conversion-specific prompts - Implement
LLMSwaggerConverterclass using existing LLMFactory - Add body parameter → requestBody LLM conversion
- Implement iterative schema type inference
- Add conversion logging and statistics
Phase 2: Validation and Refinement Loop
- Implement OpenAPI 3.x compliance validator
- Add LLM-powered error analysis and correction
- Create iterative refinement loop with max iterations
- Add backup and rollback mechanisms
- Integrate conversion caching for efficiency
Phase 3: Integration and Optimization
- Test with argo-workflows specification (29 body parameters)
- Validate full pipeline: conversion → overlay → MCP generation
- Performance optimization for large specs (batch processing)
- Add configuration options for LLM provider selection
- Documentation and usage examples
Consequences
Positive
- Intelligent Conversion: LLM understands context and preserves semantic meaning
- Self-Correcting: Iterative refinement automatically fixes conversion issues
- Automated Legacy Support: Seamless handling of complex Swagger 2.0 specifications
- Reduced Manual Effort: Elimination of post-conversion fixes through smart inference
- Enhanced Developer Experience: One-command conversion with intelligent feedback
- Contextual Understanding: LLM leverages API patterns and naming conventions
- Future-Proof: Adapts to new OpenAPI features and edge cases
Negative
- LLM Dependencies: Requires API keys and network connectivity for optimal performance
- Cost Considerations: LLM API calls add operational costs (mitigated by caching)
- Performance Impact: Additional LLM calls may increase processing time
- Non-Deterministic Results: LLM outputs may vary slightly between runs
- Complexity: More sophisticated pipeline with multiple components to maintain
Risks and Mitigations
- Risk: Breaking existing OpenAPI 3.x specs Mitigation: Detect and skip conversion for already-valid specs
- Risk: Complex edge cases in real-world specifications Mitigation: Incremental rollout with comprehensive logging
- Risk: Performance degradation on large specifications Mitigation: Batch processing and LLM response caching
- Risk: LLM service unavailability or API limits Mitigation: Rule-based fallback system and graceful degradation
Success Metrics
Quantitative Metrics
- Conversion Success Rate: >95% for Swagger 2.0 specifications
- Parameter Fix Rate: 100% of parameters have valid schemas
- Body Parameter Conversion: 100% converted to proper requestBody
- LLM Accuracy: >90% first-pass correctness for schema inference
- Iteration Efficiency: <2 average iterations per problematic operation
- Performance: <25% increase in processing time (accounting for LLM calls)
Qualitative Metrics
- Developer Feedback: Reduced support requests for conversion issues
- Code Quality: Generated MCP code compiles without manual fixes
- Semantic Preservation: LLM maintains API intent during conversion
- Maintenance: Reduced manual intervention in conversion pipeline
Test Cases
Primary Test Case: Argo Workflows
- Input:
examples/argo-workflows/argo-openapi.json(Swagger 2.0, 771KB, 255 operations) - Expected: Valid OpenAPI 3.x with all 29 body parameters converted
- Validation: Successful MCP code generation and compilation
Additional Test Cases
- Petstore API: Basic Swagger 2.0 conversion validation
- Complex Enterprise APIs: Real-world specifications with edge cases
- Already-Valid OpenAPI 3.x: Ensure no regression in existing specs
Implementation Details
File Structure (Updated)
Current Unified Structure
openapi_mcp_codegen/
├── openapi_enhancer.py (unified: overlay generation + application + enhancement)
├── mcp_codegen.py (existing: MCP server code generation)
├── prompt.yaml (extended with conversion prompts)
└── templates/ (existing template directory)
Planned LLM Enhancement Structure
openapi_mcp_codegen/
├── openapi_enhancer.py (enhanced with LLM conversion)
├── conversion/ (new: LLM-powered conversion modules)
│ ├── __init__.py
│ ├── llm_converter.py (new: LLM conversion logic)
│ ├── iterative_refiner.py (new: refinement loop)
│ └── swagger_analyzer.py (new: Swagger 2.0 analysis)
├── validation/
│ ├── __init__.py
│ ├── openapi_validator.py (new: spec validation)
│ ├── conversion_validator.py (new: conversion verification)
│ └── llm_validator.py (new: LLM-based validation)
├── prompt.yaml (extended with conversion prompts)
└── templates/ (existing template directory)
Consolidation Completed:
- ✅ Merged
enhance_and_generate.py,overlay_applier.py,overlay_generator.py→openapi_enhancer.py - ✅ Unified CLI with subcommands:
enhance,generate-overlay,apply-overlay - ✅ Reduced codebase complexity while preserving all functionality
Configuration Options
# config.yaml enhancement options
conversion:
enabled: true
use_llm: true
llm_provider: "openai" # or anthropic, google
max_iterations: 3
batch_size: 10
backup_original: true
strict_validation: true
rollback_on_failure: true
log_conversion_stats: true
preserve_extensions: true
# LLM-specific settings
llm_settings:
temperature: 0.1 # Low for consistency
max_tokens: 2000
timeout: 30
retry_attempts: 2
enable_caching: true
References
- OpenAPI 3.0 Specification
- Swagger 2.0 to OpenAPI 3.0 Migration Guide
- Argo Workflows OpenAPI Specification (see examples/argo-workflows/argo-openapi-unedited.json in repository)
- ADR-004: OpenAPI Overlay Enhancement
- ADR-002: OpenAPI Specification Automatic Fixes and Enhancements
Approval
- Author: Assistant
- Reviewers: [To be assigned]
- Decision Date: [To be determined]
- Review Date: [To be scheduled after implementation]