ADR-005: Enhanced OpenAPI Conversion for Swagger 2.0 Specifications

Status

Proposed - 2024-11-09

Context

The current enhance_and_generate.py script provides basic Swagger 2.0 to OpenAPI 3.x conversion capabilities, but encounters significant issues when processing complex real-world specifications like the Argo Workflows API.

Current Limitations

Based on analysis of argo-openapi.json (771KB, 255 operations), we identified:

Incomplete Swagger 2.0 Conversion: Only handles basic fields (swagger, basePath, schemes, host)
Invalid Body Parameters: 29 parameters with "in": "body" are removed but not converted to proper requestBody sections
Schema Validation Issues: Parameters lack proper schema or content fields required by OpenAPI 3.x
Limited Error Handling: Insufficient validation and recovery mechanisms
Manual Fixes Required: Significant post-conversion cleanup needed

Business Impact

Development Overhead: Manual fixes required for each Swagger 2.0 specification
Error-Prone Process: Inconsistent conversion results across different APIs
Limited Legacy Support: Many enterprise APIs still use Swagger 2.0 format
Reduced Reliability: MCP code generation fails on improperly converted specs

Decision

We will implement a hybrid approach combining rule-based conversion with iterative LLM-powered intelligence to provide comprehensive, automated Swagger 2.0 to OpenAPI 3.x conversion.

Core Improvements

1. Complete Swagger 2.0 Schema Conversion

# Convert all Swagger 2.0 constructs:
definitions -> components/schemas
parameters -> components/parameters
responses -> components/responses
securityDefinitions -> components/securitySchemes
# Update $ref paths: #/definitions/* -> #/components/schemas/*

2. Intelligent Body Parameter Conversion

# Transform body parameters to requestBody:
- Extract schema from body parameters
- Create proper requestBody with media types
- Preserve descriptions and examples
- Handle multiple content types (application/json, etc.)

3. Enhanced Parameter Schema Fixing

# Improved type inference:
- Pattern-based detection (timestamps, IDs, counts)
- OpenAPI convention compliance
- Array parameter handling with items
- Complex object parameter support
- Parameter reference resolution

4. LLM-Powered Iterative Conversion

# Intelligent conversion using existing LLM infrastructure:
- Leverage existing LLMFactory and prompt.yaml system
- Context-aware parameter type inference
- Smart schema transformation with validation
- Self-correcting through iterative refinement
- Preserve semantic meaning during conversion

5. Comprehensive Validation Framework

# Multi-stage validation:
- Pre-conversion spec validation
- Post-conversion OpenAPI 3.x compliance
- Parameter schema completeness check
- Reference integrity verification
- Rollback on conversion failure

LLM-Powered Iterative Approach

Leveraging Existing Infrastructure

The system already has robust LLM integration through:

LLMFactory from cnoe_agent_utils: Multi-provider support (OpenAI, Anthropic, Google)
Declarative Prompts: Sophisticated prompt.yaml configuration system
LangChain Integration: Production-ready LLM pipeline with error handling

Iterative Conversion Pipeline

LLM Conversion Prompts

1. Body Parameter Conversion

# Addition to prompt.yaml
swagger_body_parameter_conversion:
  system_prompt: |
    You are an expert at converting Swagger 2.0 body parameters to OpenAPI 3.x requestBody format.

    CONVERSION RULES:
    1. Extract schema from body parameter
    2. Create requestBody with proper media types
    3. Preserve descriptions, examples, and constraints
    4. Handle array/object schemas correctly
    5. Use application/json as default media type

    INPUT: Swagger 2.0 body parameter object
    OUTPUT: Valid OpenAPI 3.x requestBody object (JSON format)

  user_prompt_template: |
    Convert this Swagger 2.0 body parameter to OpenAPI 3.x requestBody:

    Parameter: {body_parameter}
    Operation: {method} {path}

    Return only valid JSON requestBody object.

2. Schema Type Inference

schema_type_inference:
  system_prompt: |
    You are an expert at inferring parameter types from context and naming conventions.

    INFERENCE RULES:
    - Analyze parameter name patterns
    - Consider operation context
    - Use OpenAPI 3.x type system
    - Provide format specifiers when appropriate
    - Handle arrays with proper items definitions

  user_prompt_template: |
    Infer schema for parameter: {param_name}
    Location: {param_in}
    Operation: {operation_context}
    Original type: {original_type}

    Generate OpenAPI 3.x schema object (JSON).

Initial Conversion: Rule-based + LLM analysis
Validation Pass: Check OpenAPI 3.x compliance
Error Analysis: LLM identifies specific issues
Targeted Fixes: LLM generates precise corrections
Re-validation: Verify fixes maintain spec integrity
Iteration Limit: Maximum 3 refinement cycles

Configuration Integration

# config.yaml enhancement
conversion:
  enabled: true
  use_llm: true
  llm_provider: "openai"  # or anthropic, google
  max_iterations: 3
  batch_size: 10
  validation_strict: true
  preserve_extensions: true

  # LLM-specific settings
  llm_settings:
    temperature: 0.1  # Low for consistency
    max_tokens: 2000
    timeout: 30
    retry_attempts: 2

Implementation Plan

Phase 1: LLM-Powered Conversion Foundation

Extend prompt.yaml with conversion-specific prompts
Implement LLMSwaggerConverter class using existing LLMFactory
Add body parameter → requestBody LLM conversion
Implement iterative schema type inference
Add conversion logging and statistics

Implement OpenAPI 3.x compliance validator
Add LLM-powered error analysis and correction
Create iterative refinement loop with max iterations
Add backup and rollback mechanisms
Integrate conversion caching for efficiency

Phase 3: Integration and Optimization

Test with argo-workflows specification (29 body parameters)
Validate full pipeline: conversion → overlay → MCP generation
Performance optimization for large specs (batch processing)
Add configuration options for LLM provider selection
Documentation and usage examples

Consequences

Positive

Intelligent Conversion: LLM understands context and preserves semantic meaning
Self-Correcting: Iterative refinement automatically fixes conversion issues
Automated Legacy Support: Seamless handling of complex Swagger 2.0 specifications
Reduced Manual Effort: Elimination of post-conversion fixes through smart inference
Enhanced Developer Experience: One-command conversion with intelligent feedback
Contextual Understanding: LLM leverages API patterns and naming conventions
Future-Proof: Adapts to new OpenAPI features and edge cases

Negative

LLM Dependencies: Requires API keys and network connectivity for optimal performance
Cost Considerations: LLM API calls add operational costs (mitigated by caching)
Performance Impact: Additional LLM calls may increase processing time
Non-Deterministic Results: LLM outputs may vary slightly between runs
Complexity: More sophisticated pipeline with multiple components to maintain

Risks and Mitigations

Risk: Breaking existing OpenAPI 3.x specs Mitigation: Detect and skip conversion for already-valid specs
Risk: Complex edge cases in real-world specifications Mitigation: Incremental rollout with comprehensive logging
Risk: Performance degradation on large specifications Mitigation: Batch processing and LLM response caching
Risk: LLM service unavailability or API limits Mitigation: Rule-based fallback system and graceful degradation

Success Metrics

Quantitative Metrics

Conversion Success Rate: >95% for Swagger 2.0 specifications
Parameter Fix Rate: 100% of parameters have valid schemas
Body Parameter Conversion: 100% converted to proper requestBody
LLM Accuracy: >90% first-pass correctness for schema inference
Iteration Efficiency: <2 average iterations per problematic operation
Performance: <25% increase in processing time (accounting for LLM calls)

Qualitative Metrics

Developer Feedback: Reduced support requests for conversion issues
Code Quality: Generated MCP code compiles without manual fixes
Semantic Preservation: LLM maintains API intent during conversion
Maintenance: Reduced manual intervention in conversion pipeline

Test Cases

Primary Test Case: Argo Workflows

Input: examples/argo-workflows/argo-openapi.json (Swagger 2.0, 771KB, 255 operations)
Expected: Valid OpenAPI 3.x with all 29 body parameters converted
Validation: Successful MCP code generation and compilation

Additional Test Cases

Petstore API: Basic Swagger 2.0 conversion validation
Complex Enterprise APIs: Real-world specifications with edge cases
Already-Valid OpenAPI 3.x: Ensure no regression in existing specs

Implementation Details

File Structure (Updated)

Current Unified Structure

openapi_mcp_codegen/
├── openapi_enhancer.py (unified: overlay generation + application + enhancement)
├── mcp_codegen.py (existing: MCP server code generation)
├── prompt.yaml (extended with conversion prompts)
└── templates/ (existing template directory)

Planned LLM Enhancement Structure

openapi_mcp_codegen/
├── openapi_enhancer.py (enhanced with LLM conversion)
├── conversion/ (new: LLM-powered conversion modules)
│   ├── __init__.py
│   ├── llm_converter.py (new: LLM conversion logic)
│   ├── iterative_refiner.py (new: refinement loop)
│   └── swagger_analyzer.py (new: Swagger 2.0 analysis)
├── validation/
│   ├── __init__.py
│   ├── openapi_validator.py (new: spec validation)
│   ├── conversion_validator.py (new: conversion verification)
│   └── llm_validator.py (new: LLM-based validation)
├── prompt.yaml (extended with conversion prompts)
└── templates/ (existing template directory)

Consolidation Completed:

✅ Merged enhance_and_generate.py, overlay_applier.py, overlay_generator.py → openapi_enhancer.py
✅ Unified CLI with subcommands: enhance, generate-overlay, apply-overlay
✅ Reduced codebase complexity while preserving all functionality

Configuration Options

# config.yaml enhancement options
conversion:
  enabled: true
  use_llm: true
  llm_provider: "openai"  # or anthropic, google
  max_iterations: 3
  batch_size: 10
  backup_original: true
  strict_validation: true
  rollback_on_failure: true
  log_conversion_stats: true
  preserve_extensions: true

  # LLM-specific settings
  llm_settings:
    temperature: 0.1  # Low for consistency
    max_tokens: 2000
    timeout: 30
    retry_attempts: 2
    enable_caching: true

References

OpenAPI 3.0 Specification
Swagger 2.0 to OpenAPI 3.0 Migration Guide
Argo Workflows OpenAPI Specification (see examples/argo-workflows/argo-openapi-unedited.json in repository)
ADR-004: OpenAPI Overlay Enhancement
ADR-002: OpenAPI Specification Automatic Fixes and Enhancements

Approval

Author: Assistant
Reviewers: [To be assigned]
Decision Date: [To be determined]
Review Date: [To be scheduled after implementation]

Status​

Context​

Current Limitations​

Business Impact​

Decision​

Core Improvements​

1. Complete Swagger 2.0 Schema Conversion​

2. Intelligent Body Parameter Conversion​

3. Enhanced Parameter Schema Fixing​

4. LLM-Powered Iterative Conversion​

5. Comprehensive Validation Framework​

LLM-Powered Iterative Approach​

Leveraging Existing Infrastructure​

Iterative Conversion Pipeline​

LLM Conversion Prompts​

1. Body Parameter Conversion​

2. Schema Type Inference​

Iterative Refinement Process​

Configuration Integration​

Implementation Plan​

Phase 1: LLM-Powered Conversion Foundation​

Phase 2: Validation and Refinement Loop​

Phase 3: Integration and Optimization​

Consequences​

Positive​

Negative​

Risks and Mitigations​

Success Metrics​

Quantitative Metrics​

Qualitative Metrics​

Test Cases​

Primary Test Case: Argo Workflows​

Additional Test Cases​

Implementation Details​

File Structure (Updated)​

Current Unified Structure​

Planned LLM Enhancement Structure​

Configuration Options​

References​

Approval​