abstractllm

Name	abstractllm JSON
Version	1.1.5 JSON
	download
home_page	None
Summary	A unified interface for large language models with support for OpenAI, Anthropic, Hugging Face, Ollama, and MLX
upload_time	2025-10-08 08:04:45
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	abstraction ai apple silicon claude gpt huggingface llm mlx ollama openai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # AbstractLLM

[![PyPI version](https://badge.fury.io/py/abstractllm.svg)](https://badge.fury.io/py/abstractllm)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/release/python-311/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A unified interface for Large Language Models with memory, reasoning, and tool capabilities.

Version: 1.1.5

## Overview

AbstractLLM provides a consistent interface for multiple LLM providers while offering agentic capabilities including hierarchical memory systems, ReAct reasoning cycles, and universal tool support. The framework focuses on practical AI agent development.

## Table of Contents

- [Key Features](#key-features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Agent Development](#agent-development)
- [Enhanced Tools System](#enhanced-tools-system)
- [Memory & Reasoning](#memory--reasoning)
- [Provider Support](#provider-support)
- [Command-Line Examples](#command-line-examples)
- [Contributing](#contributing)
- [License](#license)

## Key Features

### Core Infrastructure
- 🔄 **Universal Provider Support**: OpenAI, Anthropic, Ollama, HuggingFace, MLX, and LM Studio with consistent API
- 🔌 **Provider Agnostic**: Switch between providers with minimal code changes
- 🛠️ **Enhanced Tool System**: Tool creation with Pydantic validation and retry logic (alpha phase)
- 📊 **Model Capability Detection**: Automatic detection of tool support, vision capabilities, and context limits

### Agentic Capabilities (Alpha Testing)
- 🧠 **Hierarchical Memory**: Working, episodic, and semantic memory with cross-session persistence (alpha)
- 🔄 **ReAct Reasoning**: Complete reasoning cycles with scratchpad traces and fact extraction (alpha)
- 🌐 **Knowledge Graphs**: Automatic fact extraction and relationship mapping (alpha)
- 🎯 **Context-Aware Retrieval**: Memory-enhanced LLM prompting with relevant context injection (alpha)
- 📝 **Session Management**: Persistent conversations with memory consolidation

### Production Features
- 🖼️ **Vision Support**: Multimodal capabilities across compatible providers
- 📝 **Structured Output**: JSON/YAML response formatting with validation
- 🔤 **Type Safety**: Full type hints and enum-based parameters
- 🛑 **Unified Error Handling**: Consistent error handling with retry strategies
- 🍎 **Apple Silicon Optimization**: Native MLX support for M1/M2/M3 devices

## Installation

```bash
# Core installation with basic features
pip install abstractllm

# Provider-specific installations
pip install "abstractllm[openai]"       # OpenAI API support
pip install "abstractllm[anthropic]"    # Anthropic/Claude API support
pip install "abstractllm[ollama]"       # Ollama local models
pip install "abstractllm[huggingface]"  # HuggingFace models
pip install "abstractllm[mlx]"          # Apple Silicon MLX support
pip install "abstractllm[lmstudio]"     # LM Studio local API support
pip install "abstractllm[tools]"        # Enhanced tool system

# Comprehensive installation (recommended)
pip install "abstractllm[all]"          # All providers (MLX will install on Apple Silicon only)
```

**Note**: The `[all]` extra includes MLX dependencies which are Apple Silicon specific. On non-Apple platforms, MLX dependencies will be installed but MLX functionality will not be available.

## Quick Start

### Basic LLM Usage

```python
from abstractllm import create_llm

# Create an LLM instance
llm = create_llm("openai", model="gpt-4o-mini")
response = llm.generate("Explain quantum computing briefly.")
print(response)

# Switch providers seamlessly
anthropic_llm = create_llm("anthropic", model="claude-3-5-sonnet-20241022")
response = anthropic_llm.generate("Tell me about yourself.")
print(response)
```

### Unified API Examples

AbstractLLM now provides a unified `generate()` method that handles all scenarios consistently:

```python
from abstractllm import create_llm
from abstractllm.session import Session
from abstractllm.tools import register

# Create tools for testing
@register
def get_current_time():
    import datetime
    return datetime.datetime.now().strftime("%H:%M:%S")

# Create session
llm = create_llm("anthropic", model="claude-3-5-sonnet-20241022")
session = Session(provider=llm, tools=[get_current_time])

# 1. Basic generation
response = session.generate("Explain quantum computing")
print(response.content)  # Always has .content attribute

# 2. Tool usage (automatically detected)
response = session.generate("What time is it?")
print(response.content)  # Tools executed transparently

# 3. Streaming without tools
for chunk in session.generate("Count from 1 to 5", stream=True):
    print(chunk.content, end="")  # Every chunk has .content

# 4. Streaming with tools (NEW: Now works consistently!)
for chunk in session.generate("What time is it?", stream=True):
    print(chunk.content, end="")  # Fixed: No more AttributeError!
```

**Key Benefits of Unified API:**
- ✅ **Consistent Interface**: All scenarios return `GenerateResponse` or `Generator[GenerateResponse]`
- ✅ **No More Errors**: Streaming with tools now works without `AttributeError`
- ✅ **Single Method**: No need to choose between `generate()` and `generate_with_tools_streaming()`
- ✅ **Future-Proof**: Follows OpenAI 2025 unified pattern

### Stateless vs Stateful Architecture

AbstractLLM provides two distinct access patterns for different use cases:

#### **1. Stateless LLM Access (Direct Provider)**
For rapid inference without memory or conversation history:

```python
from abstractllm import create_llm

# Direct provider access - no memory, no conversation history
llm = create_llm("anthropic", model="claude-3-5-sonnet-20241022")

# Simple stateless generation
response = llm.generate("What is quantum computing?")
print(response.content)  # Always returns GenerateResponse object

# Stateless with tools
from abstractllm.tools import register

@register
def get_weather(city: str) -> str:
    return f"Weather in {city}: Sunny, 25°C"

response = llm.generate("What's the weather in Paris?", tools=[get_weather])
print(response.content)  # Tool executed without session memory
```

#### **2. Stateful Session Access (Memory + Conversation)**
For persistent conversations with memory, reasoning, and advanced features:

```python
from abstractllm.session import Session

# Stateful session with memory and conversation history
session = Session(provider=llm, enable_memory=True)  # Alpha feature

# Conversation with memory
response1 = session.generate("My name is Alice and I like AI research")
response2 = session.generate("What do you remember about me?")  # Uses memory context

# ReAct reasoning cycles (alpha)
response = session.generate(
    "Analyze the project structure and recommend improvements",
    create_react_cycle=True,     # Alpha: Complete reasoning traces
    use_memory_context=True      # Alpha: Inject relevant memories
)
```

#### **Architecture Flow**
```
Session.generate() → Enhanced with memory/reasoning → Provider.generate() → LLM API
     ↓                                                      ↓
 [Stateful]                                            [Stateless]
 - Memory context                                      - Direct inference
 - ReAct reasoning                                     - Tool execution
 - Conversation history                                - @file parsing
 - Cross-session persistence                           - Response metadata
```

**Note**: We may later simplify AbstractLLM to ONLY handle stateless LLM operations and move memory/agent capabilities to separate packages for better modularity.

### Third-Party Integration

AbstractLLM is designed for easy integration into existing projects:

```python
from abstractllm import create_llm
from abstractllm.session import Session

class MyAIAssistant:
    def __init__(self, provider="openai", model="gpt-4o-mini"):
        self.llm = create_llm(provider, model=model)
        self.session = Session(provider=self.llm, enable_memory=True)  # Alpha feature
    
    def ask(self, question: str) -> str:
        """Ask the assistant a question with memory (alpha)."""
        response = self.session.generate(question)
        return response.content
    
    def ask_with_tools(self, question: str, tools: list) -> str:
        """Ask with tool support using unified API."""
        response = self.session.generate(question, tools=tools)
        return response.content

    def ask_streaming(self, question: str) -> str:
        """Ask with streaming response - unified API ensures consistent .content access."""
        accumulated = ""
        for chunk in self.session.generate(question, stream=True):
            accumulated += chunk.content  # Always available with unified API
        return accumulated

# Usage in your application
assistant = MyAIAssistant(provider="anthropic")
answer = assistant.ask("What did we discuss earlier?")
```

## Agent Development

### ALMA-Simple: Intelligent Agent Example

AbstractLLM includes `alma-simple.py`, a complete example of an agent with memory, reasoning, and tool capabilities:

```bash
# Interactive agent with memory and tools
python alma-simple.py

# Single query with provider switching
python alma-simple.py --provider openai --model gpt-4o-mini \
    --prompt "list the files in the current directory"

# Use enhanced models that work well
python alma-simple.py --provider ollama --model qwen3-coder:30b \
    --prompt "read README.md and summarize it"

# LM Studio - Local models with OpenAI API compatibility
python alma-simple.py --provider lmstudio --model qwen/qwen3-next-80b \
    --prompt "analyze the project structure"
```

**Note**: Our testing shows that `qwen3-coder:30b` works particularly well for coding tasks and tool usage.

### Key Agent Features Demonstrated

```python
from abstractllm.factory import create_session
from abstractllm.tools.common_tools import read_file, list_files, search_files

# Create agent session
session = create_session(
    "anthropic",
    model="claude-3-5-haiku-20241022", 
    enable_memory=True,            # Hierarchical memory (alpha)
    enable_retry=True,             # Retry strategies
    tools=[read_file, list_files], # Tool capabilities
    max_tool_calls=25,             # Prevent infinite loops
    system_prompt="You are a helpful assistant with memory and tools."
)

# Agent can reason, remember, and use tools
response = session.generate(
    prompt="Read the project files and remember the key concepts",
    use_memory_context=True,     # Use relevant memories (alpha)
    create_react_cycle=True,     # Create reasoning trace (alpha)
)
```

## Enhanced Tools System

AbstractLLM features an enhanced tool system with validation capabilities:

### Basic Tool Creation

```python
from abstractllm.tools import tool
from pydantic import Field

@tool(retry_on_error=True, timeout=30.0)
def search_web(
    query: str = Field(description="Search query", min_length=1),
    max_results: int = Field(default=10, ge=1, le=100)
) -> list[str]:
    """Search the web for information.
    
    Args:
        query: The search query to execute
        max_results: Maximum number of results
    """
    # Implementation
    return [f"Result for: {query}"]
```

### Advanced Tool Features

```python
from abstractllm.tools import tool, ToolContext
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    relevance: float = Field(ge=0.0, le=1.0)

@tool(
    parse_docstring=True,           # Extract parameter descriptions
    retry_on_error=True,            # Retry on validation errors
    max_retries=3,                  # Maximum retry attempts
    timeout=30.0,                   # Execution timeout
    tags=["search", "web"],         # Categorization
    when_to_use="When user needs current web information",
    requires_context=True,          # Inject session context
    response_model=SearchResult     # Validate response
)
def enhanced_search(
    query: str = Field(min_length=1, max_length=500),
    context: ToolContext = None    # Auto-injected
) -> list[SearchResult]:
    """Enhanced web search with validation."""
    # Access session memory through context
    if context.memory:
        relevant_facts = context.memory.search(query)
    
    return [SearchResult(title="Example", url="http://example.com", relevance=0.9)]
```

### Tool System Features

- **Pydantic Validation**: Automatic input/output validation with LLM-friendly error messages
- **Retry Logic**: Intelligent retry on validation errors
- **Docstring Parsing**: Extract parameter descriptions from Google/NumPy/Sphinx docstrings
- **Context Injection**: Access session memory and metadata in tools
- **Timeout Support**: Prevent hanging tool executions
- **Deprecation Warnings**: Mark tools as deprecated with migration messages
- **Universal Compatibility**: Works across all providers (native and prompted)

## Memory & Reasoning (Alpha Testing)

### Hierarchical Memory System

AbstractLLM implements a hierarchical memory architecture (alpha testing):

```python
from abstractllm.memory import HierarchicalMemory
from abstractllm.factory import create_session

# Create session with memory (alpha)
session = create_session(
    "ollama",
    model="qwen3:4b",
    enable_memory=True,              # Alpha feature
    memory_config={
        'working_memory_size': 10,     # Recent context items
        'consolidation_threshold': 5,   # When to consolidate to long-term
        'cross_session_persistence': True  # Remember across sessions
    }
)

# Memory automatically:
# - Extracts facts from conversations
# - Creates knowledge graphs with relationships  
# - Consolidates important information
# - Provides relevant context for new queries
```

### Memory Components

1. **Working Memory**: Recent interactions and context
2. **Episodic Memory**: Consolidated experiences and events
3. **Semantic Memory**: Extracted facts and knowledge graph
4. **ReAct Cycles**: Complete reasoning traces with scratchpads
5. **Bidirectional Links**: Relationships between all memory components

### Example Memory Usage

```python
# Query with memory context (alpha)
response = session.generate(
    "What did I tell you about my project?",
    use_memory_context=True  # Inject relevant memories (alpha)
)

# Create reasoning cycle (alpha)
response = session.generate(
    "Analyze the project structure and make recommendations",
    create_react_cycle=True  # Full ReAct reasoning with scratchpad (alpha)
)

# Access memory directly
if session.memory:
    stats = session.memory.get_statistics()
    print(f"Facts learned: {stats['knowledge_graph']['total_facts']}")
    print(f"ReAct cycles: {stats['total_react_cycles']}")
```

## Response Format & Metadata

All AbstractLLM providers return a consistent `GenerateResponse` object with rich metadata:

### **GenerateResponse Structure**

```python
@dataclass
class GenerateResponse:
    # Core response data
    content: Optional[str] = None              # The actual LLM response text
    raw_response: Any = None                   # Original provider response
    model: Optional[str] = None                # Model that generated the response
    finish_reason: Optional[str] = None        # Why generation stopped

    # Usage and performance metadata
    usage: Optional[Dict[str, int]] = None     # Token counts (prompt/completion/total)

    # Tool execution metadata
    tool_calls: Optional[List[Dict[str, Any]]] = None    # Tools that were called
    tools_executed: Optional[List[Dict[str, Any]]] = None # Execution results

    # Enhanced agent capabilities (Alpha)
    react_cycle_id: Optional[str] = None       # ReAct reasoning cycle ID
    facts_extracted: Optional[List[str]] = None # Knowledge extracted
    reasoning_trace: Optional[str] = None      # Complete reasoning steps
    total_reasoning_time: Optional[float] = None # Time spent reasoning
    scratchpad_file: Optional[str] = None      # Path to detailed traces

    # Vision capabilities
    image_paths: Optional[List[str]] = None    # Images used in generation
```

### **Why This Metadata Matters**

1. **Consistent API**: All providers return the same structure regardless of underlying differences
2. **Observability**: Track token usage, execution time, and tool calls across providers
3. **Agent Capabilities**: Access reasoning traces, extracted facts, and memory updates
4. **Debugging**: Raw responses and detailed traces for troubleshooting
5. **Cost Tracking**: Token usage data for monitoring API costs
6. **Tool Monitoring**: See exactly which tools were called and their results

### **Usage Examples**

```python
# Basic response access
response = llm.generate("Explain machine learning")
print(response.content)                    # The response text
print(response.model)                      # "claude-3-5-sonnet-20241022"
print(response.usage)                      # {"prompt_tokens": 15, "completion_tokens": 150}

# Tool execution metadata
response = llm.generate("What time is it?", tools=[get_time])
print(response.has_tool_calls())           # True
print(response.get_tools_executed())       # ["get_time"]

# Agent reasoning (Alpha)
response = session.generate("Complex task", create_react_cycle=True)
print(response.get_summary())              # "ReAct Cycle: cycle_abc123 | Tools: 2 executed | Facts: 5 extracted"
print(response.get_scratchpad_trace())     # Detailed reasoning steps
print(response.react_cycle_id)             # "cycle_abc123"
```

## Provider Support

### OpenAI - Manual Provider Improvements
```python
# Supported through manual provider enhancements
llm = create_llm("openai", model="gpt-4o-mini") # Vision + tools
llm = create_llm("openai", model="gpt-4o")      # Latest supported model
llm = create_llm("openai", model="gpt-4-turbo")  # Multimodal support

# Enhanced parameters through manual provider improvements
llm = create_llm("openai", 
                 model="gpt-4o",
                 seed=42,                    # Reproducible outputs
                 frequency_penalty=1.0,      # Reduce repetition  
                 presence_penalty=0.5)       # Encourage new topics
```

### Anthropic - Claude Models
```python
llm = create_llm("anthropic", model="claude-3-5-sonnet-20241022")
llm = create_llm("anthropic", model="claude-3-5-haiku-20241022")  # Fast and efficient
```

### Local Models - Ollama & MLX
```python
# Ollama for various open-source models
llm = create_llm("ollama", model="qwen3:4b")         # Good balance
llm = create_llm("ollama", model="qwen3-coder:30b")  # Excellent for coding

# MLX for Apple Silicon (M1/M2/M3)
llm = create_llm("mlx", model="mlx-community/GLM-4.5-Air-4bit")
llm = create_llm("mlx", model="Qwen/Qwen3-4B-MLX-4bit")
```

### HuggingFace - Open Source Models
```python
llm = create_llm("huggingface", model="Qwen/Qwen3-4B")
llm = create_llm("huggingface", model="microsoft/Phi-4-mini-instruct")
```

### LM Studio - Local Model Server
```python
# LM Studio provides OpenAI-compatible API for local models
llm = create_llm("lmstudio",
                 model="qwen/qwen3-next-80b",           # Any model loaded in LM Studio
                 base_url="http://localhost:1234/v1")   # Default LM Studio URL

# Advanced parameters with model capability detection
llm = create_llm("lmstudio",
                 model="qwen/qwen3-next-80b",
                 temperature=0.7,
                 max_tokens=16384,                      # Automatically limited by model
                 base_url="http://localhost:1234/v1")

# Custom server configuration
llm = create_llm("lmstudio",
                 model="llama-3.2-3b-instruct",
                 base_url="http://192.168.1.100:1234/v1")  # Remote LM Studio instance
```

#### LM Studio Features
- **OpenAI-Compatible API**: Seamless integration with existing OpenAI code
- **Local Model Hosting**: Run models locally with GPU acceleration
- **Model Auto-Detection**: Automatically detects model capabilities from JSON assets
- **Tool Support**: Works with prompted tool calling for compatible models
- **Memory Management**: Unified `/mem` command shows correct token limits

## Command-Line Examples

### ALMA-Simple Agent Examples

```bash
# Basic usage with different providers
python alma-simple.py --provider anthropic --model claude-3-5-haiku-20241022 \
    --prompt "list the files in the current directory"

python alma-simple.py --provider openai --model gpt-4o-mini \
    --prompt "read README.md and summarize the key features"

python alma-simple.py --provider ollama --model qwen3-coder:30b \
    --prompt "analyze the project structure"

# Advanced usage with memory persistence
python alma-simple.py --memory agent_memory.pkl \
    --prompt "Remember that I'm working on an AI project"

# Interactive mode with verbose logging
python alma-simple.py --verbose

# Control tool usage iterations
python alma-simple.py --max-tool-calls 10 \
    --prompt "carefully examine each file in the project"
```

### Verified Working Configurations

These configurations have been tested and work reliably:

```bash
# OpenAI - Supported models through manual provider improvements
python alma-simple.py --provider openai --model gpt-4o-mini \
    --prompt "list files" --max-tool-calls 3

python alma-simple.py --provider openai --model gpt-4o \
    --prompt "list files" --max-tool-calls 3

# Anthropic - Reliable and fast
python alma-simple.py --provider anthropic --model claude-3-5-haiku-20241022 \
    --prompt "list files" --max-tool-calls 3

# Ollama - Excellent open-source option
python alma-simple.py --provider ollama --model qwen3:4b \
    --prompt "read README.md and summarize it"

# HuggingFace - Direct model usage
python alma-simple.py --provider huggingface --model Qwen/Qwen3-4B \
    --prompt "list the files"

# MLX - Apple Silicon optimized
python alma-simple.py --provider mlx --model mlx-community/GLM-4.5-Air-4bit \
    --prompt "list files"

# LM Studio - Local model server
python alma-simple.py --provider lmstudio --model qwen/qwen3-next-80b \
    --prompt "read README.md and explain the key concepts"
```

**Note**: `qwen3-coder:30b` via Ollama works well for coding tasks and reasoning.

## Architecture Detection & Model Capabilities

AbstractLLM features an intelligent architecture detection system that automatically configures providers and models based on comprehensive JSON assets. The system handles model name normalization, capability detection, and parameter validation across all providers.

### Key Features
- **Automatic Model Detection**: Recognizes 80+ models across 7 architecture families
- **Provider Compatibility**: Handles OpenAI, Anthropic, LM Studio, Ollama, MLX, and HuggingFace
- **Unified Parameter System**: Consistent parameter handling with model capability validation
- **Smart Normalization**: Converts provider-specific names to canonical model identifiers

### Quick Example
```python
from abstractllm import create_llm

# Model capabilities are automatically detected
llm = create_llm("lmstudio", model="qwen/qwen3-next-80b")
# → Detects: 262,144 context / 16,384 output / prompted tools

# Unified memory management
user> /mem
🧠 Memory System Overview
  Model: qwen/qwen3-next-80b
  Model Max: 262,144 input / 16,384 output
  Token Usage & Limits: ...
```

**📚 For detailed documentation**: See [Architecture Detection & Model Capabilities](docs/architecture-model-detection.md)

## Key Improvements in Recent Versions

### New LM Studio Provider
- **OpenAI-Compatible API**: Seamless integration with LM Studio local model server
- **Automatic Model Detection**: Intelligent capability detection based on JSON assets
- **Unified Memory Management**: Correct token limits and parameter validation
- **Tool Integration**: Prompted tool support for compatible models

### Provider Architecture Improvements
- **Enhanced Model Detection**: Robust model name normalization and capability lookup
- **JSON Asset System**: Comprehensive model capabilities database with 80+ models
- **Unified Parameter System**: Consistent parameter handling across all providers
- **Architecture Templates**: Automatic message formatting for 7+ model families

### Unified Generation API (Latest)
- **API Consistency**: Streaming now always returns `GenerateResponse` objects with `.content` attribute
- **Single Method**: `session.generate()` handles all scenarios (streaming/non-streaming, tools/no-tools)
- **Bug Fix**: Resolved "AttributeError: 'str' object has no attribute 'content'" in streaming tool scenarios
- **Backward Compatible**: `generate_with_tools_streaming()` deprecated but still functional with warnings
- **SOTA Compliance**: Follows OpenAI 2025 unified pattern for consistent developer experience

### OpenAI Provider Improvements
- **Manual Provider Enhancements**: Improved OpenAI provider through custom implementation
- **Enhanced Parameters**: Support for seed, frequency_penalty, presence_penalty
- **Better Error Handling**: Improved API error management and retry logic

### Memory & Reasoning Enhancements  
- **Hierarchical Memory**: Implementation of hierarchical memory management
- **Cross-Session Persistence**: Knowledge preserved across different sessions
- **ReAct Reasoning**: Complete reasoning cycles with scratchpad traces
- **Knowledge Graphs**: Automatic fact extraction and relationship mapping
- **Context-Aware Retrieval**: Memory-enhanced prompting for better responses

### Universal Tool System
- **Enhanced @tool Decorator**: Pydantic validation, retry logic, rich metadata
- **Provider Agnostic**: Works with all providers (native tools or prompted)
- **Context Injection**: Tools can access session memory and metadata
- **Backward Compatible**: Existing @register decorator still supported
- **Production Ready**: Timeouts, confirmations, deprecation warnings

### Architecture Improvements
- **Unified Session System**: Single session class with all capabilities
- **Provider Detection**: Automatic capability detection and optimization
- **Memory Consolidation**: Integration of memory features
- **Error Recovery**: Intelligent fallback and retry strategies

## Recent Implementation Improvements

### **✅ COMPLETED: Fixed Streaming Import Issue**

**Achievement**: Resolved "cannot access free variable 'GenerateResponse'" error with simple import fix.

**Simple Fix Applied**:
- ✅ **Proper imports**: GenerateResponse correctly imported in all providers
- ✅ **Consistent returns**: All providers return GenerateResponse objects uniformly
- ✅ **Streaming fixed**: No more scope/import errors in streaming mode
- ✅ **Architecture preserved**: Kept existing working design, just fixed imports

### **✅ COMPLETED: Removed Legacy Wrapper Classes**

**Achievement**: Cleaned up codebase by removing unnecessary compatibility classes.

**Removed Classes**:
- ❌ `OllamaLLM`, `OpenAILLM`, `AnthropicLLM`, `LMStudioLLM`, `HuggingFaceLLM` (deleted)
- ✅ **Simplified Architecture**: Use `create_llm()` factory method for all provider instantiation
- ✅ **Reduced Maintenance**: Eliminated duplicate wrapper code

## Integration Examples

### Simple Integration
```python
from abstractllm import create_llm

# Drop-in replacement for OpenAI client
def my_ai_function(prompt: str) -> str:
    llm = create_llm("openai", model="gpt-4o-mini")
    return llm.generate(prompt).content

# With provider flexibility  
def flexible_ai(prompt: str, provider: str = "anthropic") -> str:
    llm = create_llm(provider)
    return llm.generate(prompt).content
```

### Advanced Agent Integration
```python
from abstractllm.factory import create_session
from abstractllm.tools import tool

@tool
def get_user_data(user_id: str) -> dict:
    """Fetch user data from your database."""
    return {"name": "Alice", "preferences": ["AI", "coding"]}

class CustomerServiceAgent:
    def __init__(self):
        self.session = create_session(
            "anthropic", 
            model="claude-3-5-sonnet-20241022",
            enable_memory=True,         # Alpha feature
            tools=[get_user_data],
            system_prompt="You are a helpful customer service agent."
        )
    
    def handle_request(self, user_id: str, message: str) -> str:
        prompt = f"User {user_id} says: {message}"
        response = self.session.generate(
            prompt, 
            use_memory_context=True,    # Remember previous interactions (alpha)
            create_react_cycle=True     # Detailed reasoning (alpha)
        )
        return response.content
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

**AbstractLLM** - Unified LLM interface with agentic capabilities.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "abstractllm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "abstraction, ai, apple silicon, claude, gpt, huggingface, llm, mlx, ollama, openai",
    "author": null,
    "author_email": "Laurent-Philippe Albou <lpalbou@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0b/75/ba2cdfb8ef01864a4184fe134c153a02710b04a11b443ec016c967c39954/abstractllm-1.1.5.tar.gz",
    "platform": null,
    "description": "# AbstractLLM\n\n[![PyPI version](https://badge.fury.io/py/abstractllm.svg)](https://badge.fury.io/py/abstractllm)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/release/python-311/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA unified interface for Large Language Models with memory, reasoning, and tool capabilities.\n\nVersion: 1.1.5\n\n## Overview\n\nAbstractLLM provides a consistent interface for multiple LLM providers while offering agentic capabilities including hierarchical memory systems, ReAct reasoning cycles, and universal tool support. The framework focuses on practical AI agent development.\n\n## Table of Contents\n\n- [Key Features](#key-features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Agent Development](#agent-development)\n- [Enhanced Tools System](#enhanced-tools-system)\n- [Memory & Reasoning](#memory--reasoning)\n- [Provider Support](#provider-support)\n- [Command-Line Examples](#command-line-examples)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Key Features\n\n### Core Infrastructure\n- \ud83d\udd04 **Universal Provider Support**: OpenAI, Anthropic, Ollama, HuggingFace, MLX, and LM Studio with consistent API\n- \ud83d\udd0c **Provider Agnostic**: Switch between providers with minimal code changes\n- \ud83d\udee0\ufe0f **Enhanced Tool System**: Tool creation with Pydantic validation and retry logic (alpha phase)\n- \ud83d\udcca **Model Capability Detection**: Automatic detection of tool support, vision capabilities, and context limits\n\n### Agentic Capabilities (Alpha Testing)\n- \ud83e\udde0 **Hierarchical Memory**: Working, episodic, and semantic memory with cross-session persistence (alpha)\n- \ud83d\udd04 **ReAct Reasoning**: Complete reasoning cycles with scratchpad traces and fact extraction (alpha)\n- \ud83c\udf10 **Knowledge Graphs**: Automatic fact extraction and relationship mapping (alpha)\n- \ud83c\udfaf **Context-Aware Retrieval**: Memory-enhanced LLM prompting with relevant context injection (alpha)\n- \ud83d\udcdd **Session Management**: Persistent conversations with memory consolidation\n\n### Production Features\n- \ud83d\uddbc\ufe0f **Vision Support**: Multimodal capabilities across compatible providers\n- \ud83d\udcdd **Structured Output**: JSON/YAML response formatting with validation\n- \ud83d\udd24 **Type Safety**: Full type hints and enum-based parameters\n- \ud83d\uded1 **Unified Error Handling**: Consistent error handling with retry strategies\n- \ud83c\udf4e **Apple Silicon Optimization**: Native MLX support for M1/M2/M3 devices\n\n## Installation\n\n```bash\n# Core installation with basic features\npip install abstractllm\n\n# Provider-specific installations\npip install \"abstractllm[openai]\"       # OpenAI API support\npip install \"abstractllm[anthropic]\"    # Anthropic/Claude API support\npip install \"abstractllm[ollama]\"       # Ollama local models\npip install \"abstractllm[huggingface]\"  # HuggingFace models\npip install \"abstractllm[mlx]\"          # Apple Silicon MLX support\npip install \"abstractllm[lmstudio]\"     # LM Studio local API support\npip install \"abstractllm[tools]\"        # Enhanced tool system\n\n# Comprehensive installation (recommended)\npip install \"abstractllm[all]\"          # All providers (MLX will install on Apple Silicon only)\n```\n\n**Note**: The `[all]` extra includes MLX dependencies which are Apple Silicon specific. On non-Apple platforms, MLX dependencies will be installed but MLX functionality will not be available.\n\n## Quick Start\n\n### Basic LLM Usage\n\n```python\nfrom abstractllm import create_llm\n\n# Create an LLM instance\nllm = create_llm(\"openai\", model=\"gpt-4o-mini\")\nresponse = llm.generate(\"Explain quantum computing briefly.\")\nprint(response)\n\n# Switch providers seamlessly\nanthropic_llm = create_llm(\"anthropic\", model=\"claude-3-5-sonnet-20241022\")\nresponse = anthropic_llm.generate(\"Tell me about yourself.\")\nprint(response)\n```\n\n### Unified API Examples\n\nAbstractLLM now provides a unified `generate()` method that handles all scenarios consistently:\n\n```python\nfrom abstractllm import create_llm\nfrom abstractllm.session import Session\nfrom abstractllm.tools import register\n\n# Create tools for testing\n@register\ndef get_current_time():\n    import datetime\n    return datetime.datetime.now().strftime(\"%H:%M:%S\")\n\n# Create session\nllm = create_llm(\"anthropic\", model=\"claude-3-5-sonnet-20241022\")\nsession = Session(provider=llm, tools=[get_current_time])\n\n# 1. Basic generation\nresponse = session.generate(\"Explain quantum computing\")\nprint(response.content)  # Always has .content attribute\n\n# 2. Tool usage (automatically detected)\nresponse = session.generate(\"What time is it?\")\nprint(response.content)  # Tools executed transparently\n\n# 3. Streaming without tools\nfor chunk in session.generate(\"Count from 1 to 5\", stream=True):\n    print(chunk.content, end=\"\")  # Every chunk has .content\n\n# 4. Streaming with tools (NEW: Now works consistently!)\nfor chunk in session.generate(\"What time is it?\", stream=True):\n    print(chunk.content, end=\"\")  # Fixed: No more AttributeError!\n```\n\n**Key Benefits of Unified API:**\n- \u2705 **Consistent Interface**: All scenarios return `GenerateResponse` or `Generator[GenerateResponse]`\n- \u2705 **No More Errors**: Streaming with tools now works without `AttributeError`\n- \u2705 **Single Method**: No need to choose between `generate()` and `generate_with_tools_streaming()`\n- \u2705 **Future-Proof**: Follows OpenAI 2025 unified pattern\n\n### Stateless vs Stateful Architecture\n\nAbstractLLM provides two distinct access patterns for different use cases:\n\n#### **1. Stateless LLM Access (Direct Provider)**\nFor rapid inference without memory or conversation history:\n\n```python\nfrom abstractllm import create_llm\n\n# Direct provider access - no memory, no conversation history\nllm = create_llm(\"anthropic\", model=\"claude-3-5-sonnet-20241022\")\n\n# Simple stateless generation\nresponse = llm.generate(\"What is quantum computing?\")\nprint(response.content)  # Always returns GenerateResponse object\n\n# Stateless with tools\nfrom abstractllm.tools import register\n\n@register\ndef get_weather(city: str) -> str:\n    return f\"Weather in {city}: Sunny, 25\u00b0C\"\n\nresponse = llm.generate(\"What's the weather in Paris?\", tools=[get_weather])\nprint(response.content)  # Tool executed without session memory\n```\n\n#### **2. Stateful Session Access (Memory + Conversation)**\nFor persistent conversations with memory, reasoning, and advanced features:\n\n```python\nfrom abstractllm.session import Session\n\n# Stateful session with memory and conversation history\nsession = Session(provider=llm, enable_memory=True)  # Alpha feature\n\n# Conversation with memory\nresponse1 = session.generate(\"My name is Alice and I like AI research\")\nresponse2 = session.generate(\"What do you remember about me?\")  # Uses memory context\n\n# ReAct reasoning cycles (alpha)\nresponse = session.generate(\n    \"Analyze the project structure and recommend improvements\",\n    create_react_cycle=True,     # Alpha: Complete reasoning traces\n    use_memory_context=True      # Alpha: Inject relevant memories\n)\n```\n\n#### **Architecture Flow**\n```\nSession.generate() \u2192 Enhanced with memory/reasoning \u2192 Provider.generate() \u2192 LLM API\n     \u2193                                                      \u2193\n [Stateful]                                            [Stateless]\n - Memory context                                      - Direct inference\n - ReAct reasoning                                     - Tool execution\n - Conversation history                                - @file parsing\n - Cross-session persistence                           - Response metadata\n```\n\n**Note**: We may later simplify AbstractLLM to ONLY handle stateless LLM operations and move memory/agent capabilities to separate packages for better modularity.\n\n### Third-Party Integration\n\nAbstractLLM is designed for easy integration into existing projects:\n\n```python\nfrom abstractllm import create_llm\nfrom abstractllm.session import Session\n\nclass MyAIAssistant:\n    def __init__(self, provider=\"openai\", model=\"gpt-4o-mini\"):\n        self.llm = create_llm(provider, model=model)\n        self.session = Session(provider=self.llm, enable_memory=True)  # Alpha feature\n    \n    def ask(self, question: str) -> str:\n        \"\"\"Ask the assistant a question with memory (alpha).\"\"\"\n        response = self.session.generate(question)\n        return response.content\n    \n    def ask_with_tools(self, question: str, tools: list) -> str:\n        \"\"\"Ask with tool support using unified API.\"\"\"\n        response = self.session.generate(question, tools=tools)\n        return response.content\n\n    def ask_streaming(self, question: str) -> str:\n        \"\"\"Ask with streaming response - unified API ensures consistent .content access.\"\"\"\n        accumulated = \"\"\n        for chunk in self.session.generate(question, stream=True):\n            accumulated += chunk.content  # Always available with unified API\n        return accumulated\n\n# Usage in your application\nassistant = MyAIAssistant(provider=\"anthropic\")\nanswer = assistant.ask(\"What did we discuss earlier?\")\n```\n\n## Agent Development\n\n### ALMA-Simple: Intelligent Agent Example\n\nAbstractLLM includes `alma-simple.py`, a complete example of an agent with memory, reasoning, and tool capabilities:\n\n```bash\n# Interactive agent with memory and tools\npython alma-simple.py\n\n# Single query with provider switching\npython alma-simple.py --provider openai --model gpt-4o-mini \\\n    --prompt \"list the files in the current directory\"\n\n# Use enhanced models that work well\npython alma-simple.py --provider ollama --model qwen3-coder:30b \\\n    --prompt \"read README.md and summarize it\"\n\n# LM Studio - Local models with OpenAI API compatibility\npython alma-simple.py --provider lmstudio --model qwen/qwen3-next-80b \\\n    --prompt \"analyze the project structure\"\n```\n\n**Note**: Our testing shows that `qwen3-coder:30b` works particularly well for coding tasks and tool usage.\n\n### Key Agent Features Demonstrated\n\n```python\nfrom abstractllm.factory import create_session\nfrom abstractllm.tools.common_tools import read_file, list_files, search_files\n\n# Create agent session\nsession = create_session(\n    \"anthropic\",\n    model=\"claude-3-5-haiku-20241022\", \n    enable_memory=True,            # Hierarchical memory (alpha)\n    enable_retry=True,             # Retry strategies\n    tools=[read_file, list_files], # Tool capabilities\n    max_tool_calls=25,             # Prevent infinite loops\n    system_prompt=\"You are a helpful assistant with memory and tools.\"\n)\n\n# Agent can reason, remember, and use tools\nresponse = session.generate(\n    prompt=\"Read the project files and remember the key concepts\",\n    use_memory_context=True,     # Use relevant memories (alpha)\n    create_react_cycle=True,     # Create reasoning trace (alpha)\n)\n```\n\n## Enhanced Tools System\n\nAbstractLLM features an enhanced tool system with validation capabilities:\n\n### Basic Tool Creation\n\n```python\nfrom abstractllm.tools import tool\nfrom pydantic import Field\n\n@tool(retry_on_error=True, timeout=30.0)\ndef search_web(\n    query: str = Field(description=\"Search query\", min_length=1),\n    max_results: int = Field(default=10, ge=1, le=100)\n) -> list[str]:\n    \"\"\"Search the web for information.\n    \n    Args:\n        query: The search query to execute\n        max_results: Maximum number of results\n    \"\"\"\n    # Implementation\n    return [f\"Result for: {query}\"]\n```\n\n### Advanced Tool Features\n\n```python\nfrom abstractllm.tools import tool, ToolContext\nfrom pydantic import BaseModel\n\nclass SearchResult(BaseModel):\n    title: str\n    url: str\n    relevance: float = Field(ge=0.0, le=1.0)\n\n@tool(\n    parse_docstring=True,           # Extract parameter descriptions\n    retry_on_error=True,            # Retry on validation errors\n    max_retries=3,                  # Maximum retry attempts\n    timeout=30.0,                   # Execution timeout\n    tags=[\"search\", \"web\"],         # Categorization\n    when_to_use=\"When user needs current web information\",\n    requires_context=True,          # Inject session context\n    response_model=SearchResult     # Validate response\n)\ndef enhanced_search(\n    query: str = Field(min_length=1, max_length=500),\n    context: ToolContext = None    # Auto-injected\n) -> list[SearchResult]:\n    \"\"\"Enhanced web search with validation.\"\"\"\n    # Access session memory through context\n    if context.memory:\n        relevant_facts = context.memory.search(query)\n    \n    return [SearchResult(title=\"Example\", url=\"http://example.com\", relevance=0.9)]\n```\n\n### Tool System Features\n\n- **Pydantic Validation**: Automatic input/output validation with LLM-friendly error messages\n- **Retry Logic**: Intelligent retry on validation errors\n- **Docstring Parsing**: Extract parameter descriptions from Google/NumPy/Sphinx docstrings\n- **Context Injection**: Access session memory and metadata in tools\n- **Timeout Support**: Prevent hanging tool executions\n- **Deprecation Warnings**: Mark tools as deprecated with migration messages\n- **Universal Compatibility**: Works across all providers (native and prompted)\n\n## Memory & Reasoning (Alpha Testing)\n\n### Hierarchical Memory System\n\nAbstractLLM implements a hierarchical memory architecture (alpha testing):\n\n```python\nfrom abstractllm.memory import HierarchicalMemory\nfrom abstractllm.factory import create_session\n\n# Create session with memory (alpha)\nsession = create_session(\n    \"ollama\",\n    model=\"qwen3:4b\",\n    enable_memory=True,              # Alpha feature\n    memory_config={\n        'working_memory_size': 10,     # Recent context items\n        'consolidation_threshold': 5,   # When to consolidate to long-term\n        'cross_session_persistence': True  # Remember across sessions\n    }\n)\n\n# Memory automatically:\n# - Extracts facts from conversations\n# - Creates knowledge graphs with relationships  \n# - Consolidates important information\n# - Provides relevant context for new queries\n```\n\n### Memory Components\n\n1. **Working Memory**: Recent interactions and context\n2. **Episodic Memory**: Consolidated experiences and events\n3. **Semantic Memory**: Extracted facts and knowledge graph\n4. **ReAct Cycles**: Complete reasoning traces with scratchpads\n5. **Bidirectional Links**: Relationships between all memory components\n\n### Example Memory Usage\n\n```python\n# Query with memory context (alpha)\nresponse = session.generate(\n    \"What did I tell you about my project?\",\n    use_memory_context=True  # Inject relevant memories (alpha)\n)\n\n# Create reasoning cycle (alpha)\nresponse = session.generate(\n    \"Analyze the project structure and make recommendations\",\n    create_react_cycle=True  # Full ReAct reasoning with scratchpad (alpha)\n)\n\n# Access memory directly\nif session.memory:\n    stats = session.memory.get_statistics()\n    print(f\"Facts learned: {stats['knowledge_graph']['total_facts']}\")\n    print(f\"ReAct cycles: {stats['total_react_cycles']}\")\n```\n\n## Response Format & Metadata\n\nAll AbstractLLM providers return a consistent `GenerateResponse` object with rich metadata:\n\n### **GenerateResponse Structure**\n\n```python\n@dataclass\nclass GenerateResponse:\n    # Core response data\n    content: Optional[str] = None              # The actual LLM response text\n    raw_response: Any = None                   # Original provider response\n    model: Optional[str] = None                # Model that generated the response\n    finish_reason: Optional[str] = None        # Why generation stopped\n\n    # Usage and performance metadata\n    usage: Optional[Dict[str, int]] = None     # Token counts (prompt/completion/total)\n\n    # Tool execution metadata\n    tool_calls: Optional[List[Dict[str, Any]]] = None    # Tools that were called\n    tools_executed: Optional[List[Dict[str, Any]]] = None # Execution results\n\n    # Enhanced agent capabilities (Alpha)\n    react_cycle_id: Optional[str] = None       # ReAct reasoning cycle ID\n    facts_extracted: Optional[List[str]] = None # Knowledge extracted\n    reasoning_trace: Optional[str] = None      # Complete reasoning steps\n    total_reasoning_time: Optional[float] = None # Time spent reasoning\n    scratchpad_file: Optional[str] = None      # Path to detailed traces\n\n    # Vision capabilities\n    image_paths: Optional[List[str]] = None    # Images used in generation\n```\n\n### **Why This Metadata Matters**\n\n1. **Consistent API**: All providers return the same structure regardless of underlying differences\n2. **Observability**: Track token usage, execution time, and tool calls across providers\n3. **Agent Capabilities**: Access reasoning traces, extracted facts, and memory updates\n4. **Debugging**: Raw responses and detailed traces for troubleshooting\n5. **Cost Tracking**: Token usage data for monitoring API costs\n6. **Tool Monitoring**: See exactly which tools were called and their results\n\n### **Usage Examples**\n\n```python\n# Basic response access\nresponse = llm.generate(\"Explain machine learning\")\nprint(response.content)                    # The response text\nprint(response.model)                      # \"claude-3-5-sonnet-20241022\"\nprint(response.usage)                      # {\"prompt_tokens\": 15, \"completion_tokens\": 150}\n\n# Tool execution metadata\nresponse = llm.generate(\"What time is it?\", tools=[get_time])\nprint(response.has_tool_calls())           # True\nprint(response.get_tools_executed())       # [\"get_time\"]\n\n# Agent reasoning (Alpha)\nresponse = session.generate(\"Complex task\", create_react_cycle=True)\nprint(response.get_summary())              # \"ReAct Cycle: cycle_abc123 | Tools: 2 executed | Facts: 5 extracted\"\nprint(response.get_scratchpad_trace())     # Detailed reasoning steps\nprint(response.react_cycle_id)             # \"cycle_abc123\"\n```\n\n## Provider Support\n\n### OpenAI - Manual Provider Improvements\n```python\n# Supported through manual provider enhancements\nllm = create_llm(\"openai\", model=\"gpt-4o-mini\") # Vision + tools\nllm = create_llm(\"openai\", model=\"gpt-4o\")      # Latest supported model\nllm = create_llm(\"openai\", model=\"gpt-4-turbo\")  # Multimodal support\n\n# Enhanced parameters through manual provider improvements\nllm = create_llm(\"openai\", \n                 model=\"gpt-4o\",\n                 seed=42,                    # Reproducible outputs\n                 frequency_penalty=1.0,      # Reduce repetition  \n                 presence_penalty=0.5)       # Encourage new topics\n```\n\n### Anthropic - Claude Models\n```python\nllm = create_llm(\"anthropic\", model=\"claude-3-5-sonnet-20241022\")\nllm = create_llm(\"anthropic\", model=\"claude-3-5-haiku-20241022\")  # Fast and efficient\n```\n\n### Local Models - Ollama & MLX\n```python\n# Ollama for various open-source models\nllm = create_llm(\"ollama\", model=\"qwen3:4b\")         # Good balance\nllm = create_llm(\"ollama\", model=\"qwen3-coder:30b\")  # Excellent for coding\n\n# MLX for Apple Silicon (M1/M2/M3)\nllm = create_llm(\"mlx\", model=\"mlx-community/GLM-4.5-Air-4bit\")\nllm = create_llm(\"mlx\", model=\"Qwen/Qwen3-4B-MLX-4bit\")\n```\n\n### HuggingFace - Open Source Models\n```python\nllm = create_llm(\"huggingface\", model=\"Qwen/Qwen3-4B\")\nllm = create_llm(\"huggingface\", model=\"microsoft/Phi-4-mini-instruct\")\n```\n\n### LM Studio - Local Model Server\n```python\n# LM Studio provides OpenAI-compatible API for local models\nllm = create_llm(\"lmstudio\",\n                 model=\"qwen/qwen3-next-80b\",           # Any model loaded in LM Studio\n                 base_url=\"http://localhost:1234/v1\")   # Default LM Studio URL\n\n# Advanced parameters with model capability detection\nllm = create_llm(\"lmstudio\",\n                 model=\"qwen/qwen3-next-80b\",\n                 temperature=0.7,\n                 max_tokens=16384,                      # Automatically limited by model\n                 base_url=\"http://localhost:1234/v1\")\n\n# Custom server configuration\nllm = create_llm(\"lmstudio\",\n                 model=\"llama-3.2-3b-instruct\",\n                 base_url=\"http://192.168.1.100:1234/v1\")  # Remote LM Studio instance\n```\n\n#### LM Studio Features\n- **OpenAI-Compatible API**: Seamless integration with existing OpenAI code\n- **Local Model Hosting**: Run models locally with GPU acceleration\n- **Model Auto-Detection**: Automatically detects model capabilities from JSON assets\n- **Tool Support**: Works with prompted tool calling for compatible models\n- **Memory Management**: Unified `/mem` command shows correct token limits\n\n## Command-Line Examples\n\n### ALMA-Simple Agent Examples\n\n```bash\n# Basic usage with different providers\npython alma-simple.py --provider anthropic --model claude-3-5-haiku-20241022 \\\n    --prompt \"list the files in the current directory\"\n\npython alma-simple.py --provider openai --model gpt-4o-mini \\\n    --prompt \"read README.md and summarize the key features\"\n\npython alma-simple.py --provider ollama --model qwen3-coder:30b \\\n    --prompt \"analyze the project structure\"\n\n# Advanced usage with memory persistence\npython alma-simple.py --memory agent_memory.pkl \\\n    --prompt \"Remember that I'm working on an AI project\"\n\n# Interactive mode with verbose logging\npython alma-simple.py --verbose\n\n# Control tool usage iterations\npython alma-simple.py --max-tool-calls 10 \\\n    --prompt \"carefully examine each file in the project\"\n```\n\n### Verified Working Configurations\n\nThese configurations have been tested and work reliably:\n\n```bash\n# OpenAI - Supported models through manual provider improvements\npython alma-simple.py --provider openai --model gpt-4o-mini \\\n    --prompt \"list files\" --max-tool-calls 3\n\npython alma-simple.py --provider openai --model gpt-4o \\\n    --prompt \"list files\" --max-tool-calls 3\n\n# Anthropic - Reliable and fast\npython alma-simple.py --provider anthropic --model claude-3-5-haiku-20241022 \\\n    --prompt \"list files\" --max-tool-calls 3\n\n# Ollama - Excellent open-source option\npython alma-simple.py --provider ollama --model qwen3:4b \\\n    --prompt \"read README.md and summarize it\"\n\n# HuggingFace - Direct model usage\npython alma-simple.py --provider huggingface --model Qwen/Qwen3-4B \\\n    --prompt \"list the files\"\n\n# MLX - Apple Silicon optimized\npython alma-simple.py --provider mlx --model mlx-community/GLM-4.5-Air-4bit \\\n    --prompt \"list files\"\n\n# LM Studio - Local model server\npython alma-simple.py --provider lmstudio --model qwen/qwen3-next-80b \\\n    --prompt \"read README.md and explain the key concepts\"\n```\n\n**Note**: `qwen3-coder:30b` via Ollama works well for coding tasks and reasoning.\n\n## Architecture Detection & Model Capabilities\n\nAbstractLLM features an intelligent architecture detection system that automatically configures providers and models based on comprehensive JSON assets. The system handles model name normalization, capability detection, and parameter validation across all providers.\n\n### Key Features\n- **Automatic Model Detection**: Recognizes 80+ models across 7 architecture families\n- **Provider Compatibility**: Handles OpenAI, Anthropic, LM Studio, Ollama, MLX, and HuggingFace\n- **Unified Parameter System**: Consistent parameter handling with model capability validation\n- **Smart Normalization**: Converts provider-specific names to canonical model identifiers\n\n### Quick Example\n```python\nfrom abstractllm import create_llm\n\n# Model capabilities are automatically detected\nllm = create_llm(\"lmstudio\", model=\"qwen/qwen3-next-80b\")\n# \u2192 Detects: 262,144 context / 16,384 output / prompted tools\n\n# Unified memory management\nuser> /mem\n\ud83e\udde0 Memory System Overview\n  Model: qwen/qwen3-next-80b\n  Model Max: 262,144 input / 16,384 output\n  Token Usage & Limits: ...\n```\n\n**\ud83d\udcda For detailed documentation**: See [Architecture Detection & Model Capabilities](docs/architecture-model-detection.md)\n\n## Key Improvements in Recent Versions\n\n### New LM Studio Provider\n- **OpenAI-Compatible API**: Seamless integration with LM Studio local model server\n- **Automatic Model Detection**: Intelligent capability detection based on JSON assets\n- **Unified Memory Management**: Correct token limits and parameter validation\n- **Tool Integration**: Prompted tool support for compatible models\n\n### Provider Architecture Improvements\n- **Enhanced Model Detection**: Robust model name normalization and capability lookup\n- **JSON Asset System**: Comprehensive model capabilities database with 80+ models\n- **Unified Parameter System**: Consistent parameter handling across all providers\n- **Architecture Templates**: Automatic message formatting for 7+ model families\n\n### Unified Generation API (Latest)\n- **API Consistency**: Streaming now always returns `GenerateResponse` objects with `.content` attribute\n- **Single Method**: `session.generate()` handles all scenarios (streaming/non-streaming, tools/no-tools)\n- **Bug Fix**: Resolved \"AttributeError: 'str' object has no attribute 'content'\" in streaming tool scenarios\n- **Backward Compatible**: `generate_with_tools_streaming()` deprecated but still functional with warnings\n- **SOTA Compliance**: Follows OpenAI 2025 unified pattern for consistent developer experience\n\n### OpenAI Provider Improvements\n- **Manual Provider Enhancements**: Improved OpenAI provider through custom implementation\n- **Enhanced Parameters**: Support for seed, frequency_penalty, presence_penalty\n- **Better Error Handling**: Improved API error management and retry logic\n\n### Memory & Reasoning Enhancements  \n- **Hierarchical Memory**: Implementation of hierarchical memory management\n- **Cross-Session Persistence**: Knowledge preserved across different sessions\n- **ReAct Reasoning**: Complete reasoning cycles with scratchpad traces\n- **Knowledge Graphs**: Automatic fact extraction and relationship mapping\n- **Context-Aware Retrieval**: Memory-enhanced prompting for better responses\n\n### Universal Tool System\n- **Enhanced @tool Decorator**: Pydantic validation, retry logic, rich metadata\n- **Provider Agnostic**: Works with all providers (native tools or prompted)\n- **Context Injection**: Tools can access session memory and metadata\n- **Backward Compatible**: Existing @register decorator still supported\n- **Production Ready**: Timeouts, confirmations, deprecation warnings\n\n### Architecture Improvements\n- **Unified Session System**: Single session class with all capabilities\n- **Provider Detection**: Automatic capability detection and optimization\n- **Memory Consolidation**: Integration of memory features\n- **Error Recovery**: Intelligent fallback and retry strategies\n\n## Recent Implementation Improvements\n\n### **\u2705 COMPLETED: Fixed Streaming Import Issue**\n\n**Achievement**: Resolved \"cannot access free variable 'GenerateResponse'\" error with simple import fix.\n\n**Simple Fix Applied**:\n- \u2705 **Proper imports**: GenerateResponse correctly imported in all providers\n- \u2705 **Consistent returns**: All providers return GenerateResponse objects uniformly\n- \u2705 **Streaming fixed**: No more scope/import errors in streaming mode\n- \u2705 **Architecture preserved**: Kept existing working design, just fixed imports\n\n### **\u2705 COMPLETED: Removed Legacy Wrapper Classes**\n\n**Achievement**: Cleaned up codebase by removing unnecessary compatibility classes.\n\n**Removed Classes**:\n- \u274c `OllamaLLM`, `OpenAILLM`, `AnthropicLLM`, `LMStudioLLM`, `HuggingFaceLLM` (deleted)\n- \u2705 **Simplified Architecture**: Use `create_llm()` factory method for all provider instantiation\n- \u2705 **Reduced Maintenance**: Eliminated duplicate wrapper code\n\n## Integration Examples\n\n### Simple Integration\n```python\nfrom abstractllm import create_llm\n\n# Drop-in replacement for OpenAI client\ndef my_ai_function(prompt: str) -> str:\n    llm = create_llm(\"openai\", model=\"gpt-4o-mini\")\n    return llm.generate(prompt).content\n\n# With provider flexibility  \ndef flexible_ai(prompt: str, provider: str = \"anthropic\") -> str:\n    llm = create_llm(provider)\n    return llm.generate(prompt).content\n```\n\n### Advanced Agent Integration\n```python\nfrom abstractllm.factory import create_session\nfrom abstractllm.tools import tool\n\n@tool\ndef get_user_data(user_id: str) -> dict:\n    \"\"\"Fetch user data from your database.\"\"\"\n    return {\"name\": \"Alice\", \"preferences\": [\"AI\", \"coding\"]}\n\nclass CustomerServiceAgent:\n    def __init__(self):\n        self.session = create_session(\n            \"anthropic\", \n            model=\"claude-3-5-sonnet-20241022\",\n            enable_memory=True,         # Alpha feature\n            tools=[get_user_data],\n            system_prompt=\"You are a helpful customer service agent.\"\n        )\n    \n    def handle_request(self, user_id: str, message: str) -> str:\n        prompt = f\"User {user_id} says: {message}\"\n        response = self.session.generate(\n            prompt, \n            use_memory_context=True,    # Remember previous interactions (alpha)\n            create_react_cycle=True     # Detailed reasoning (alpha)\n        )\n        return response.content\n```\n\n## Contributing\n\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n**AbstractLLM** - Unified LLM interface with agentic capabilities.",
    "bugtrack_url": null,
    "license": null,
    "summary": "A unified interface for large language models with support for OpenAI, Anthropic, Hugging Face, Ollama, and MLX",
    "version": "1.1.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/lpalbou/abstractllm/issues",
        "Homepage": "https://github.com/lpalbou/abstractllm",
        "Repository": "https://github.com/lpalbou/abstractllm.git"
    },
    "split_keywords": [
        "abstraction",
        " ai",
        " apple silicon",
        " claude",
        " gpt",
        " huggingface",
        " llm",
        " mlx",
        " ollama",
        " openai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ec8eda89f38a12a7e825f003e5b775886ea3f0caea9a7342c2fa4de774085295",
                "md5": "ea7f9494e000b03fe63b0bb2723702da",
                "sha256": "ee48b71f6ec9fae7a9290803f63c0880c46bddb6369469e7a71ad80f5c229800"
            },
            "downloads": -1,
            "filename": "abstractllm-1.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ea7f9494e000b03fe63b0bb2723702da",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 407437,
            "upload_time": "2025-10-08T08:04:40",
            "upload_time_iso_8601": "2025-10-08T08:04:40.642491Z",
            "url": "https://files.pythonhosted.org/packages/ec/8e/da89f38a12a7e825f003e5b775886ea3f0caea9a7342c2fa4de774085295/abstractllm-1.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0b75ba2cdfb8ef01864a4184fe134c153a02710b04a11b443ec016c967c39954",
                "md5": "8263dfabe751d47e2686cdd8a077d003",
                "sha256": "6b0627cbae37a2cb298e027304c38d8c622d677b0d5bc7a4efb50f3755ac950b"
            },
            "downloads": -1,
            "filename": "abstractllm-1.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "8263dfabe751d47e2686cdd8a077d003",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7621268,
            "upload_time": "2025-10-08T08:04:45",
            "upload_time_iso_8601": "2025-10-08T08:04:45.525932Z",
            "url": "https://files.pythonhosted.org/packages/0b/75/ba2cdfb8ef01864a4184fe134c153a02710b04a11b443ec016c967c39954/abstractllm-1.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-08 08:04:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lpalbou",
    "github_project": "abstractllm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "abstractllm"
}

None