API Client
The APIClient
class provides a unified interface for interacting with various AI model APIs, handling both native provider-specific implementations and generic model access through OpenRouter and LiteLLM. With streaming support, advanced callback handling, and comprehensive message preprocessing.
Architecture
The API client architecture v0.3.3.3.post1 consists of:
- APIClient - Main interface with enhanced streaming and callback handling
- Client Handlers - Provider-specific implementations (native adapters or gateway classes)
- Message Processing - Advanced preprocessing with system prompt injection
- Streaming Support - Real-time callback handling with async/sync compatibility
Provider Architecture
Message Flow
Initialization
def __init__(self, model_config: ModelConfig):
"""
Initialize the APIClient.
Args:
model_config (ModelConfig): Configuration for the AI model.
"""
The client automatically attempts to use the most appropriate adapter:
- First tries to use a native, provider-specific adapter (e.g., direct Anthropic SDK)
- Falls back to generic provider adapters via LiteLLM if native adapter isn't available
Key Methods
Enhanced Message Processing
def _prepare_messages_with_system_prompt(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]
Injects system prompt at the beginning of message list while preserving other system messages (action results, iteration markers). Ensures system prompt is always in slot-0 position.
Features:
- Preserves action results and iteration markers
- Handles duplicate system prompt removal
- Maintains message order integrity
Advanced Streaming with Callback Handling
async def get_response(
self,
messages: List[Dict[str, Any]],
max_tokens: Optional[int] = None,
temperature: Optional[float] = None,
stream: Optional[bool] = None,
stream_callback: Optional[Callable[[str], None]] = None,
**kwargs: Any
) -> str
Unified interface for both streaming and non-streaming responses with enhanced callback handling.
Parameters:
messages
: List of message dictionaries (OpenAI format)max_tokens
: Maximum tokens to generatetemperature
: Sampling temperaturestream
: Whether to use streaming modestream_callback
: Callback accepting(chunk: str, message_type: str)
**kwargs
: Additional parameters (reasoning config, etc.)
Features:
- Automatic callback wrapping for async/sync compatibility
- Enhanced logging with request IDs and token counting
- Fallback retry logic for empty responses
- Comprehensive error handling and recovery
Token Counting with Fallback Strategy
def count_tokens(self, content: Union[str, List, Dict]) -> int
Counts tokens using provider-specific tokenizer with intelligent fallback strategy.
Fallback Chain:
- Client handler's native token counter
- LiteLLM generic counter for model
- Rough character-based estimation (final fallback)
System Prompt Management
def set_system_prompt(self, prompt: str) -> None
Sets the system prompt with debug logging for length tracking.
Performance Optimization
def _ensure_litellm_configured(self) -> None
Lazy initialization of LiteLLM to avoid import-time overhead. Disables debugging and sets optimal defaults for production use.
Multi-Provider Support
The APIClient
supports multiple providers through adapter architecture:
-
Native Adapters: Direct SDK implementations for:
- Anthropic (Claude models)
- OpenAI (GPT models) - Planned but not yet implemented
- More to come
-
Generic Adapters via LiteLLM:
- OpenAI
- Anthropic
- Ollama
- DeepSeek
- Azure OpenAI
- AWS Bedrock
- And many more
The selection between native and generic adapters is controlled by the use_native_adapter
flag in the model configuration.
Provider-Specific Features
Anthropic (Native Adapter)
class AnthropicAdapter(BaseAdapter):
# Direct implementation using Anthropic's SDK
- Native multi-modal support
- Streaming implementation
- Direct token counting
- Base64 image handling
- Vision support for Claude 3 models
OpenAI (via LiteLLM)
class OpenAIAdapter(ProviderAdapter):
# Implementation for OpenAI through provider adapter
- Multi-modal content support
- Message formatting for OpenAI-specific features
- Support for OpenAI's message structure
Ollama (via LiteLLM)
class OllamaAdapter(ProviderAdapter):
# Implementation for Ollama through provider adapter
- Local model support
- Basic message formatting
DeepSeek (via LiteLLM)
class DeepseekAdapter(ProviderAdapter):
# Implementation for DeepSeek through provider adapter
- System message handling
- Role alternation enforcement
Configuration
Configure the API client through ModelConfig
with enhanced client preference system:
@dataclass
class ModelConfig:
model: str
provider: str
client_preference: str = "native" # "native", "litellm", or "openrouter"
api_base: Optional[str] = None
api_key: Optional[str] = None
api_version: Optional[str] = None
max_tokens: Optional[int] = None
max_history_tokens: Optional[int] = None
temperature: float = 0.7
streaming_enabled: bool = True
enable_token_counting: bool = True
vision_enabled: bool = None
reasoning_enabled: bool = False
reasoning_effort: Optional[str] = None
reasoning_max_tokens: Optional[int] = None
reasoning_exclude: bool = False
Client Preference Options:
"native"
: Use direct provider SDK (e.g., Anthropic SDK)"litellm"
: Use LiteLLM gateway for all providers"openrouter"
: Use OpenRouter gateway with automatic model discovery
The client preference controls which handler implementation is used, with automatic fallback to LiteLLM if native adapter is unavailable.
Usage Examples
Basic Setup with Enhanced Streaming
from penguin.llm.model_config import ModelConfig
from penguin.llm.api_client import APIClient
# Initialize with OpenRouter gateway for automatic model discovery
model_config = ModelConfig(
model="openai/gpt-5",
provider="openai",
client_preference="openrouter", # Automatic model specs fetching
streaming_enabled=True
)
# Create client (automatically configures appropriate handler)
api_client = APIClient(model_config=model_config)
api_client.set_system_prompt("You are a helpful assistant.")
Advanced Streaming with Enhanced Callbacks
# Enhanced streaming callback with message type support
async def streaming_callback(chunk: str, message_type: str = "assistant"):
if message_type == "reasoning":
print(f"[REASONING] {chunk}", end="", flush=True)
else:
print(f"[RESPONSE] {chunk}", end="", flush=True)
messages = [
{"role": "user", "content": "Explain quantum computing in simple terms"}
]
# Get streaming response with enhanced callback
response = await api_client.get_response(
messages,
stream=True,
stream_callback=streaming_callback,
max_tokens=1000,
temperature=0.7
)
print(f"\n\nFinal response: {response}")
Model Switching with Automatic Configuration
# Switch to a different model at runtime
new_config = ModelConfig(
model="anthropic/claude-sonnet-4",
provider="anthropic",
client_preference="openrouter" # Will fetch latest specs automatically
)
new_api_client = APIClient(new_config)
# Automatically fetches context window, pricing, and capabilities from OpenRouter
Enhanced Error Handling and Retry Logic
try:
response = await api_client.get_response(
messages,
stream=True,
stream_callback=streaming_callback
)
# APIClient automatically handles:
# - Empty response retries with streaming disabled
# - Token counting and logging
# - Async/sync callback compatibility
# - Comprehensive error reporting
except Exception as e:
print(f"Request failed: {e}")
# APIClient provides detailed error context for debugging
Multi-Modal Content with Vision Models
# Encode image for vision models
base64_image = api_client.encode_image_to_base64("diagram.png")
vision_messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this architecture diagram:"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
# Process normally - APIClient handles format conversion for each provider
response = await api_client.get_response(vision_messages)
Performance Features
Lazy Initialization
- LiteLLM is configured only on first use to avoid import overhead
- Client handlers are initialized during first API call
- Memory usage optimized for fast startup scenarios
Enhanced Logging
- Request IDs for tracing API calls
- Token counting with automatic fallback
- Detailed error context and recovery information
- Performance metrics and timing data
Error Recovery
- Automatic retry logic for empty responses
- Graceful fallback between streaming and non-streaming modes
- Comprehensive error reporting with context
Extension Points
To add a new provider:
- For Native Adapters: Create a new implementation in
adapters/
directory implementing the handler interface - For Gateway Adapters: Extend
LiteLLMGateway
orOpenRouterGateway
with provider-specific logic - Register Client Handler: Add to the client preference selection logic in
APIClient.__init__
- Implement Required Methods:
get_response()
,count_tokens()
, and streaming support
The modular architecture allows easy integration of new providers while maintaining consistent interfaces across all implementations.