Context Window Manager
The ContextWindowManager
v0.3.3.3.post1 handles advanced token budgeting and content trimming for conversation context with multimodal content handling and dynamic budget allocation.
Overview
The ContextWindowManager is responsible for:
- Advanced Token Budgeting: Dynamic allocation based on content type and priority
- Image Handling: Intelligent trimming of multimodal content to preserve tokens
- Category-Based Management: Fine-grained control over different message types
- Real-time Usage Tracking: Live monitoring of token consumption
- Smart Trimming Strategies: Context-aware content removal with recency preservation
- Multi-Provider Token Counting: Automatic fallback tokenization strategies
- Rich Reporting: Human-readable and programmatic token usage statistics
Message Categories and Priorities
Messages are categorized into five priority levels for intelligent trimming:
- SYSTEM (highest priority): System instructions and prompts - never trimmed (10% budget)
- CONTEXT: Important reference information, documentation, and context files (35% budget)
- DIALOG: Main conversation between user and assistant (50% budget)
- SYSTEM_OUTPUT: Tool outputs, code execution results, and action responses (5% budget)
- ERROR: Error messages and debugging information (fallback 5% budget)
Priority System
- SYSTEM messages: Guaranteed preservation with strict 10% minimum allocation
- Chronological trimming: Within each category, oldest messages are trimmed first to preserve recency
- Dynamic budget management: Automatic budget redistribution when categories are under-utilized
- Image-aware processing: Special handling for multimodal content with token-intensive elements
- Fallback categories: Automatic handling of new message types with default budgets
Trimming Strategy
Special Content Handling
Initialization
def __init__(
self,
model_config=None,
token_counter: Optional[Callable[[Any], int]] = None,
api_client=None,
config_obj: Optional[Any] = None,
)
Smart Configuration Resolution
The ContextWindowManager automatically resolves configuration from multiple sources in priority order:
- Model Configuration:
model_config.max_tokens
(if provided) - Live Config Object:
config_obj.model_config.max_history_tokens
(preferred) - Global Config:
config.yml
model settings - Fallback: Default 150,000 tokens
Intelligent Token Counter Selection
The system automatically selects the best available token counter:
- API Client Counter:
api_client.count_tokens()
(provider-specific) - Diagnostics Counter:
diagnostics.count_tokens()
(tiktoken-based) - Fallback Counter: Rough estimation (4 chars per token)
Features
- Dynamic Max Tokens: Adapts to model capabilities automatically
- Multi-source Configuration: Supports live config updates
- Provider-aware Tokenization: Uses provider-specific token counting when available
- Robust Fallbacks: Graceful degradation when components are unavailable
Clamp Notices (Sub-Agents)
When a sub-agent is created with an isolated context window and a shared_cw_max_tokens
value, the child's ContextWindowManager.max_tokens
is set to the lower of the parent’s max and the provided clamp. A system note with metadata type=cw_clamp_notice
is recorded on both the parent and child conversations to make this visible in transcripts and dashboards. The metadata includes:
sub_agent
: child agent idchild_max
: effective child max tokensparent_max
: parent’s max tokens at time of spawnclamped
: true if the child was reduced relative to parent
Key Methods
Token Management
@property
def total_budget(self) -> int
Gets the total token budget.
@property
def available_tokens(self) -> int
Gets the number of available tokens.
def get_budget(self, category: MessageCategory) -> TokenBudget
Gets the token budget for a specific category.
def update_usage(self, category: MessageCategory, tokens: int) -> None
Updates token usage for a category.
def reset_usage(self, category: Optional[MessageCategory] = None) -> None
Resets token usage for a category or all categories.
def is_over_budget(self, category: Optional[MessageCategory] = None) -> bool
Checks if a category or the entire context is over budget.
Session Processing
def analyze_session(self, session: Session) -> Dict[str, Any]
Analyzes a session for token usage statistics and multimodal content.
def trim_session(self, session: Session, preserve_recency: bool = True) -> Session
Trims session messages to fit within token budget.
def process_session(self, session: Session) -> Session
Processes a session through token budgeting and trimming - main entry point for integration.
Advanced Reporting and Analytics
def get_token_usage(self) -> Dict[str, int]
Returns comprehensive token usage statistics with structure:
{
"total": 45000,
"available": 105000,
"max": 150000,
"usage_percentage": 30.0,
"MessageCategory.SYSTEM": 5000,
"MessageCategory.CONTEXT": 15000,
"MessageCategory.DIALOG": 20000,
"MessageCategory.SYSTEM_OUTPUT": 5000
}
def format_token_usage(self) -> str
Returns human-readable token usage summary with category breakdowns.
def format_token_usage_rich(self) -> str
Returns rich-formatted token usage with progress bars and color coding.
def get_current_allocations(self) -> Dict[MessageCategory, float]
Returns current token allocations as percentages for monitoring.
def get_usage(self, category: MessageCategory) -> int
Returns current token usage for a specific category.
Advanced Trimming Strategy
The trimming strategy v0.3.3.3.post1 follows a sophisticated multi-pass approach:
Phase 1: Image Optimization
- Multi-image Detection: Identify sessions with multiple images
- Recency Preservation: Keep the most recent image intact
- Token-Efficient Replacement: Replace older images with placeholders
- Metadata Preservation: Store original image references for context
Phase 2: Budget Analysis
- Total Budget Check: Compare session tokens against model limits
- Category Budget Analysis: Check individual category limits
- Priority Queue Creation: Order categories by trim priority
Phase 3: Intelligent Trimming
- Chronological Trimming: Remove oldest messages first within each category
- Minimum Budget Enforcement: Respect minimum token allocations
- SYSTEM Protection: Never trim SYSTEM messages
- Iterative Reduction: Continue trimming until budget is met
Phase 4: Session Reconstruction
- Order Preservation: Maintain original message ordering
- Category Integrity: Keep messages in their logical categories
- Metadata Updates: Update session statistics and metadata
Image Handling
The system includes specialized handling for multimodal content:
Image Token Estimation
- Claude Models: ~4,000 tokens per image (higher for safety)
- Vision Models: Provider-specific token calculations
- Fallback Estimation: Character-based approximation
Smart Image Trimming
def _handle_image_trimming(self, session: Session) -> Session:
"""
Replace all but the most recent image with placeholders.
Preserves conversation context while saving significant tokens.
"""
Placeholder Generation
- Context Preservation: Maintains conversation flow
- Reference Tracking: Stores original image metadata
- User-Friendly Messages: Clear indication of image removal
Token Counter Fallback Chain
The system implements a robust fallback strategy for token counting:
- Provider-Specific Counter: Uses native model tokenizer (optimal accuracy)
- LiteLLM Generic Counter: Fallback to model-specific tiktoken
- Diagnostics Counter: Uses tiktoken with model mapping
- Character Estimation: Rough fallback (4 chars = 1 token)
This ensures accurate token counting across all scenarios while maintaining performance.
Advanced Usage Examples
Basic Setup with Configuration
from penguin.system.context_window import ContextWindowManager
from penguin.llm.model_config import ModelConfig
# Initialize with configuration
context_window = ContextWindowManager(
model_config=model_config,
api_client=api_client,
config_obj=live_config # Supports live config updates
)
# Automatic configuration resolution from multiple sources
print(f"Max tokens: {context_window.max_tokens}")
print(f"Available tokens: {context_window.available_tokens}")
Comprehensive Session Analysis
# Analyze session with multimodal content detection
stats = context_window.analyze_session(session)
print(f"Total tokens: {stats['total_tokens']}")
print(f"Images detected: {stats['image_count']}")
print(f"Tokens by category: {stats['per_category']}")
print(f"Over budget: {stats['over_budget']}")
# Detailed breakdown
for category, tokens in stats['per_category'].items():
percentage = (tokens / stats['total_tokens']) * 100 if stats['total_tokens'] > 0 else 0
print(f" {category.name}: {tokens} tokens ({percentage:.1f}%)")
Intelligent Session Processing
# Process session with automatic trimming and optimization
original_token_count = context_window.analyze_session(session)['total_tokens']
trimmed_session = context_window.process_session(session)
final_token_count = context_window.analyze_session(trimmed_session)['total_tokens']
print(f"Original tokens: {original_token_count}")
print(f"Final tokens: {final_token_count}")
print(f"Reduction: {original_token_count - final_token_count} tokens")
# Session integrity preserved
print(f"Messages removed: {len(session.messages) - len(trimmed_session.messages)}")
print(f"SYSTEM messages preserved: {len([m for m in trimmed_session.messages if m.category.name == 'SYSTEM'])}")
Real-time Token Monitoring
# Monitor token usage in real-time
usage = context_window.get_token_usage()
print(context_window.format_token_usage())
# Rich formatting for CLI
if rich_available:
print(context_window.format_token_usage_rich())
# Programmatic access
for category in MessageCategory:
budget = context_window.get_budget(category)
usage = context_window.get_usage(category)
percentage = (usage / budget.max_tokens) * 100 if budget.max_tokens > 0 else 0
print(f"{category.name}: {usage}/{budget.max_tokens} ({percentage:.1f}%)")
Advanced Category Management
# Update usage tracking manually
context_window.update_usage(MessageCategory.CONTEXT, 5000)
# Check if any category is over budget
for category in MessageCategory:
if context_window.is_over_budget(category):
print(f"Warning: {category.name} is over budget!")
# Reset usage tracking
context_window.reset_usage() # Reset all
context_window.reset_usage(MessageCategory.DIALOG) # Reset specific category
Token Budgeting System
The dynamic token budget allocation system v0.3.3.3.post1 includes:
Intelligent Budget Distribution
- SYSTEM: 10% (guaranteed preservation, strict minimum enforcement)
- CONTEXT: 35% (documentation and reference materials)
- DIALOG: 50% (main conversation, flexible allocation)
- SYSTEM_OUTPUT: 5% (tool results and action outputs)
- ERROR: 5% (fallback for error messages and debugging)
Dynamic Budget Management
- Automatic Redistribution: Unused budget from one category can benefit others
- Minimum Guarantees: Each category has a minimum token allocation
- Maximum Limits: Prevents any category from consuming excessive tokens
- Runtime Adjustments: Budgets can be modified during session processing
Token Efficiency Features
- Image Optimization: Intelligent replacement of redundant images
- Smart Trimming: Chronological removal preserving conversation coherence
- Category-Aware Processing: Different strategies for different content types
- Metadata Preservation: Maintains conversation context during optimization
This system ensures optimal token utilization while preserving conversation quality and system functionality.