API Client

The APIClient class provides a unified interface for interacting with various AI model APIs, handling both native provider-specific implementations and generic model access through OpenRouter and LiteLLM. With streaming support, advanced callback handling, and comprehensive message preprocessing.

Architecture

The API client architecture v0.3.3.3.post1 consists of:

APIClient - Main interface with enhanced streaming and callback handling
Client Handlers - Provider-specific implementations (native adapters or gateway classes)
Message Processing - Advanced preprocessing with system prompt injection
Streaming Support - Real-time callback handling with async/sync compatibility

Provider Architecture

Message Flow

Initialization

def __init__(self, model_config: ModelConfig):
    """
    Initialize the APIClient.

    Args:
        model_config (ModelConfig): Configuration for the AI model.
    """

The client automatically attempts to use the most appropriate adapter:

First tries to use a native, provider-specific adapter (e.g., direct Anthropic SDK)
Falls back to generic provider adapters via LiteLLM if native adapter isn't available

Key Methods

Enhanced Message Processing

def _prepare_messages_with_system_prompt(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]

Injects system prompt at the beginning of message list while preserving other system messages (action results, iteration markers). Ensures system prompt is always in slot-0 position.

Features:

Preserves action results and iteration markers
Handles duplicate system prompt removal
Maintains message order integrity

Advanced Streaming with Callback Handling

async def get_response(
    self,
    messages: List[Dict[str, Any]],
    max_tokens: Optional[int] = None,
    temperature: Optional[float] = None,
    stream: Optional[bool] = None,
    stream_callback: Optional[Callable[[str], None]] = None,
    **kwargs: Any
) -> str

Unified interface for both streaming and non-streaming responses with enhanced callback handling.

Parameters:

messages: List of message dictionaries (OpenAI format)
max_tokens: Maximum tokens to generate
temperature: Sampling temperature
stream: Whether to use streaming mode
stream_callback: Callback accepting (chunk: str, message_type: str)
**kwargs: Additional parameters (reasoning config, etc.)

Features:

Automatic callback wrapping for async/sync compatibility
Enhanced logging with request IDs and token counting
Fallback retry logic for empty responses
Comprehensive error handling and recovery

Token Counting with Fallback Strategy

def count_tokens(self, content: Union[str, List, Dict]) -> int

Counts tokens using provider-specific tokenizer with intelligent fallback strategy.

Fallback Chain:

Client handler's native token counter
LiteLLM generic counter for model
Rough character-based estimation (final fallback)

System Prompt Management

def set_system_prompt(self, prompt: str) -> None

Sets the system prompt with debug logging for length tracking.

Performance Optimization

def _ensure_litellm_configured(self) -> None

Lazy initialization of LiteLLM to avoid import-time overhead. Disables debugging and sets optimal defaults for production use.

Multi-Provider Support

The APIClient supports multiple providers through adapter architecture:

Native Adapters: Direct SDK implementations for:
- Anthropic (Claude models)
- OpenAI (GPT models) - Planned but not yet implemented
- More to come
Generic Adapters via LiteLLM:
- OpenAI
- Anthropic
- Ollama
- DeepSeek
- Azure OpenAI
- AWS Bedrock
- And many more

The selection between native and generic adapters is controlled by the use_native_adapter flag in the model configuration.

Provider-Specific Features

Anthropic (Native Adapter)

class AnthropicAdapter(BaseAdapter):
    # Direct implementation using Anthropic's SDK

Native multi-modal support
Streaming implementation
Direct token counting
Base64 image handling
Vision support for Claude 3 models

OpenAI (via LiteLLM)

class OpenAIAdapter(ProviderAdapter):
    # Implementation for OpenAI through provider adapter

Multi-modal content support
Message formatting for OpenAI-specific features
Support for OpenAI's message structure

Ollama (via LiteLLM)

class OllamaAdapter(ProviderAdapter):
    # Implementation for Ollama through provider adapter

Local model support
Basic message formatting

DeepSeek (via LiteLLM)

class DeepseekAdapter(ProviderAdapter):
    # Implementation for DeepSeek through provider adapter

System message handling
Role alternation enforcement

Configuration

Configure the API client through ModelConfig with enhanced client preference system:

@dataclass
class ModelConfig:
    model: str
    provider: str
    client_preference: str = "native"  # "native", "litellm", or "openrouter"
    api_base: Optional[str] = None
    api_key: Optional[str] = None
    api_version: Optional[str] = None
    max_tokens: Optional[int] = None
    max_history_tokens: Optional[int] = None
    temperature: float = 0.7
    streaming_enabled: bool = True
    enable_token_counting: bool = True
    vision_enabled: bool = None
    reasoning_enabled: bool = False
    reasoning_effort: Optional[str] = None
    reasoning_max_tokens: Optional[int] = None
    reasoning_exclude: bool = False

Client Preference Options:

"native": Use direct provider SDK (e.g., Anthropic SDK)
"litellm": Use LiteLLM gateway for all providers
"openrouter": Use OpenRouter gateway with automatic model discovery

The client preference controls which handler implementation is used, with automatic fallback to LiteLLM if native adapter is unavailable.

Usage Examples

Basic Setup with Enhanced Streaming

from penguin.llm.model_config import ModelConfig
from penguin.llm.api_client import APIClient

# Initialize with OpenRouter gateway for automatic model discovery
model_config = ModelConfig(
    model="openai/gpt-5",
    provider="openai",
    client_preference="openrouter",  # Automatic model specs fetching
    streaming_enabled=True
)

# Create client (automatically configures appropriate handler)
api_client = APIClient(model_config=model_config)
api_client.set_system_prompt("You are a helpful assistant.")

Advanced Streaming with Enhanced Callbacks

# Enhanced streaming callback with message type support
async def streaming_callback(chunk: str, message_type: str = "assistant"):
    if message_type == "reasoning":
        print(f"[REASONING] {chunk}", end="", flush=True)
    else:
        print(f"[RESPONSE] {chunk}", end="", flush=True)

messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms"}
]

# Get streaming response with enhanced callback
response = await api_client.get_response(
    messages,
    stream=True,
    stream_callback=streaming_callback,
    max_tokens=1000,
    temperature=0.7
)

print(f"\n\nFinal response: {response}")

Model Switching with Automatic Configuration

# Switch to a different model at runtime
new_config = ModelConfig(
    model="anthropic/claude-sonnet-4",
    provider="anthropic",
    client_preference="openrouter"  # Will fetch latest specs automatically
)

new_api_client = APIClient(new_config)
# Automatically fetches context window, pricing, and capabilities from OpenRouter

Enhanced Error Handling and Retry Logic

try:
    response = await api_client.get_response(
        messages,
        stream=True,
        stream_callback=streaming_callback
    )

    # APIClient automatically handles:
    # - Empty response retries with streaming disabled
    # - Token counting and logging
    # - Async/sync callback compatibility
    # - Comprehensive error reporting

except Exception as e:
    print(f"Request failed: {e}")
    # APIClient provides detailed error context for debugging

# Encode image for vision models
base64_image = api_client.encode_image_to_base64("diagram.png")

vision_messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this architecture diagram:"},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}"
                }
            }
        ]
    }
]

# Process normally - APIClient handles format conversion for each provider
response = await api_client.get_response(vision_messages)

Performance Features

Lazy Initialization

LiteLLM is configured only on first use to avoid import overhead
Client handlers are initialized during first API call
Memory usage optimized for fast startup scenarios

Enhanced Logging

Request IDs for tracing API calls
Token counting with automatic fallback
Detailed error context and recovery information
Performance metrics and timing data

Error Recovery

Automatic retry logic for empty responses
Graceful fallback between streaming and non-streaming modes
Comprehensive error reporting with context

Extension Points

To add a new provider:

For Native Adapters: Create a new implementation in adapters/ directory implementing the handler interface
For Gateway Adapters: Extend LiteLLMGateway or OpenRouterGateway with provider-specific logic
Register Client Handler: Add to the client preference selection logic in APIClient.__init__
Implement Required Methods: get_response(), count_tokens(), and streaming support

The modular architecture allows easy integration of new providers while maintaining consistent interfaces across all implementations.

Architecture​

Provider Architecture​

Message Flow​

Initialization​

Key Methods​

Enhanced Message Processing​

Advanced Streaming with Callback Handling​

Token Counting with Fallback Strategy​

System Prompt Management​

Performance Optimization​

Multi-Provider Support​

Provider-Specific Features​

Anthropic (Native Adapter)​

OpenAI (via LiteLLM)​

Ollama (via LiteLLM)​

DeepSeek (via LiteLLM)​

Configuration​

Usage Examples​

Basic Setup with Enhanced Streaming​

Advanced Streaming with Enhanced Callbacks​

Model Switching with Automatic Configuration​

Enhanced Error Handling and Retry Logic​

Multi-Modal Content with Vision Models​

Performance Features​

Lazy Initialization​

Enhanced Logging​

Error Recovery​

Extension Points​