AI Broker Service (Port 8085)

Status: ✅ Implemented | Version: 0.1.0

Overview

The AI Broker service provides a unified interface to various AI models and AI-powered features across the CORTX platform. It handles model routing, prompt management, PII redaction, and integrates with the RAG service for context-aware AI responses.

Core Responsibilities

AI Model Routing

Intelligent Routing: Route requests to optimal AI model
Cost Optimization: Select model based on cost/performance trade-offs
Model Selection: Claude, GPT-4, local models
Fallback Handling: Automatic failover on model unavailability

Prompt Management

Template Library: Reusable prompt templates
Prompt Engineering: Optimized prompts for specific tasks
Variable Substitution: Dynamic prompt generation
Version Control: Track prompt iterations

PII Redaction

Automatic Detection: Identify PII in prompts/responses
Redaction: Remove/mask sensitive data before AI processing
Compliance: HIPAA, GDPR, PII protection
Audit Trail: Log all redaction events

RAG Integration

Context Retrieval: Fetch relevant knowledge from RAG service
Context Injection: Augment prompts with retrieved context
Hierarchical Context: Use suite/module-scoped knowledge
Citation Generation: Link responses to source documents

Architecture Diagram

flowchart TB
    Client[Client] -->|AI Request| AIBroker[AI Broker :8085]

    subgraph "Request Processing"
        AIBroker -->|1. Redact PII| Redactor[PII Redactor]
        Redactor -->|2. Select Model| Router[Model Router]
        Router -->|3. Build Prompt| PromptBuilder[Prompt Builder]
        PromptBuilder -->|4. RAG Context| RAG[RAG Service :8138]
    end

    subgraph "Model Execution"
        PromptBuilder -->|Execute| ModelSelector{Model Type}
        ModelSelector -->|Claude| Claude[Anthropic Claude API]
        ModelSelector -->|GPT| OpenAI[OpenAI GPT API]
        ModelSelector -->|Local| LocalModel[Local LLM]
    end

    subgraph "Response Processing"
        Claude -->|Response| ResponseProc[Response Processor]
        OpenAI -->|Response| ResponseProc
        LocalModel -->|Response| ResponseProc
        ResponseProc -->|5. Redact PII| OutputRedactor[Output Redactor]
        OutputRedactor -->|6. Log Usage| Ledger[Ledger Service :8136]
    end

    OutputRedactor -->|Final Response| Client

API Endpoints

Health & Status

GET /healthz - Liveness probe
GET /readyz - Readiness probe
GET / - Service metadata

AI Inference

POST /api/ai/inference - Generate AI response

json { "prompt": "Explain title insurance", "model": "claude-3-5-sonnet-20241022", "use_rag": true, "context": { "suite_id": "fedsuite", "module_id": "title" }, "max_tokens": 1000 }

POST /api/ai/explain - Explain validation failure

json { "rule_id": "title.verification", "failure_data": {...}, "include_remediation": true }

POST /api/ai/generate-workflow - Generate WorkflowPack from natural language

json { "description": "Create a workflow for title transfer with legal approval", "domain": "legal" }

Model Management

GET /api/ai/models - List available AI models
GET /api/ai/models/{model_id}/status - Get model status
POST /api/ai/models/{model_id}/test - Test model availability

Prompt Management

GET /api/ai/prompts - List prompt templates
POST /api/ai/prompts - Create prompt template
GET /api/ai/prompts/{template_id} - Get prompt template

AI Models

Claude Models (Anthropic)

claude-3-5-sonnet-20241022: Best for complex reasoning
claude-3-haiku-20240307: Fast, cost-effective
claude-3-opus-20240229: Most capable, highest cost

OpenAI Models

gpt-4-turbo: High-quality general purpose
gpt-4: Legacy, still capable
gpt-3.5-turbo: Fast, lower cost

Local Models (Future)

llama-3-70b: On-premise deployment
mistral-7b: Lightweight local model

PII Redaction

Automatic Detection

Social Security Numbers (SSN)
Credit card numbers
Email addresses
Phone numbers
Physical addresses
Custom patterns (configurable)

Redaction Strategies

Mask: Replace with ***
Hash: SHA-256 hash for consistency
Tokenize: Replace with unique tokens
Remove: Complete removal

Example

Input:  "John Doe's SSN is 123-45-6789"
Output: "John Doe's SSN is ***-**-****"

RAG Integration

Context Augmentation

# Without RAG
prompt = "What is title insurance?"

# With RAG
context = rag.retrieve("title insurance", suite_id="fedsuite")
prompt = f"""
Based on this context:
{context}

Answer: What is title insurance?
"""

Hierarchical Context

The AI Broker leverages RAG's 4-level hierarchy:

Entity-specific knowledge
Module-specific knowledge
Suite-specific knowledge
Platform-global knowledge

Configuration

Environment Variables

# Service
PORT=8085
LOG_LEVEL=INFO

# AI Models
ANTHROPIC_API_KEY=sk-ant-xxx
OPENAI_API_KEY=sk-xxx
DEFAULT_MODEL=claude-3-5-sonnet-20241022

# RAG Integration
RAG_SERVICE_URL=http://localhost:8138

# Gateway Integration
CORTX_GATEWAY_URL=http://localhost:8080

# PII Redaction
ENABLE_PII_REDACTION=true
REDACTION_MODE=mask  # mask, hash, remove

# Authentication
REQUIRE_AUTH=false  # Set to "true" for production

Usage Examples

Generate AI Response

curl -X POST http://localhost:8085/api/ai/inference \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: tenant-123" \
  -d '{
    "prompt": "What are the steps for title verification?",
    "model": "claude-3-5-sonnet-20241022",
    "use_rag": true,
    "context": {
      "suite_id": "fedsuite",
      "module_id": "title"
    },
    "max_tokens": 1000
  }'

Response:

{
  "response": "Title verification involves the following steps:\n1. ...",
  "model_used": "claude-3-5-sonnet-20241022",
  "tokens_used": 487,
  "rag_chunks_used": 5,
  "sources": ["Title Verification Guide", "Legal Requirements"],
  "correlation_id": "req-abc123"
}

Explain Validation Failure

curl -X POST http://localhost:8085/api/ai/explain \
  -H "Content-Type: application/json" \
  -d '{
    "rule_id": "title.verification.ownership",
    "failure_data": {
      "property_id": "prop-123",
      "issue": "ownership_mismatch"
    },
    "include_remediation": true
  }'

Generate WorkflowPack

curl -X POST http://localhost:8085/api/ai/generate-workflow \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Create a workflow for property title transfer with legal approval and lien verification",
    "domain": "legal"
  }'

Response:

{
  "workflow_pack": {
    "id": "wp-generated-001",
    "name": "Property Title Transfer",
    "steps": [
      {"name": "verify_ownership", "type": "validation"},
      {"name": "check_liens", "type": "validation"},
      {"name": "legal_approval", "type": "hil_approval"},
      {"name": "transfer_title", "type": "action"}
    ]
  },
  "confidence": 0.92,
  "model_used": "claude-3-5-sonnet-20241022"
}

List Available Models

curl http://localhost:8085/api/ai/models \
  -H "Authorization: Bearer <token>"

Response:

{
  "models": [
    {
      "id": "claude-3-5-sonnet-20241022",
      "provider": "anthropic",
      "status": "available",
      "cost_per_1k_tokens": 0.003
    },
    {
      "id": "gpt-4-turbo",
      "provider": "openai",
      "status": "available",
      "cost_per_1k_tokens": 0.01
    }
  ]
}

Performance

Throughput

Simple Prompts: ~50 requests/second
RAG-Enhanced: ~20 requests/second
Complex Generation: ~5 requests/second

Latency

Claude 3.5 Sonnet: 2-5s
Claude 3 Haiku: 0.5-2s
GPT-4 Turbo: 3-6s

Cost Optimization

Automatic model selection based on complexity
Caching for repeated prompts
Token usage tracking and limits

Security Features

PII Protection

Automatic redaction before AI processing
Audit log of all redactions
Compliance with HIPAA/GDPR

API Key Management

Secure storage of provider API keys
Key rotation support
Usage quotas and rate limiting

Content Filtering

Input validation
Output sanitization
Harmful content detection

Documentation

OpenAPI Spec: openapi.yaml
Source Code: /services/ai-broker/app/main.py
Claude API: Anthropic Documentation

Support

For issues or questions:

GitHub Issues: sinergysolutionsllc/sinergysolutionsllc
Internal Documentation: /docs/services/ai-broker/