vectorizer-sdk


Namevectorizer-sdk JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryPython SDK for Vectorizer - Semantic search and vector operations with UMICP protocol support
upload_time2025-10-24 00:57:10
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords vectorizer semantic-search embeddings machine-learning ai search vectors similarity hivellm umicp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hive Vectorizer Python SDK

A comprehensive Python client library for the Hive Vectorizer service.

## Features

- **Multiple Transport Protocols**: HTTP/HTTPS and UMICP support
- **UMICP Protocol**: High-performance protocol using umicp-python package
- **Vector Operations**: Insert, search, and manage vectors
- **Collection Management**: Create, delete, and monitor collections  
- **Semantic Search**: Find similar content using embeddings
- **Intelligent Search**: Advanced multi-query search with domain expansion
- **Contextual Search**: Context-aware search with metadata filtering
- **Multi-Collection Search**: Cross-collection search with intelligent aggregation
- **Batch Operations**: Efficient bulk operations
- **Error Handling**: Comprehensive exception handling
- **Async Support**: Full async/await support for high performance
- **Type Safety**: Full type hints and validation

## Installation

```bash
pip install vectorizer-sdk
```

## Quick Start

```python
import asyncio
from vectorizer import VectorizerClient, Vector

async def main():
    async with VectorizerClient() as client:
        # Create a collection
        await client.create_collection("my_collection", dimension=512)
        
        # Generate embedding
        embedding = await client.embed_text("Hello, world!")
        
        # Create vector
        vector = Vector(
            id="doc1",
            data=embedding,
            metadata={"text": "Hello, world!"}
        )
        
        # Insert text
        await client.insert_texts("my_collection", [{
            "id": "doc1",
            "text": "Hello, world!",
            "metadata": {"source": "example"}
        }])
        
        # Search for similar vectors
        results = await client.search_vectors(
            collection="my_collection",
            query="greeting",
            limit=5
        )
        
        # Intelligent search with multi-query expansion
        from models import IntelligentSearchRequest
        intelligent_results = await client.intelligent_search(
            IntelligentSearchRequest(
                query="machine learning algorithms",
                collections=["my_collection", "research"],
                max_results=15,
                domain_expansion=True,
                technical_focus=True,
                mmr_enabled=True,
                mmr_lambda=0.7
            )
        )
        
        # Semantic search with reranking
        from models import SemanticSearchRequest
        semantic_results = await client.semantic_search(
            SemanticSearchRequest(
                query="neural networks",
                collection="my_collection",
                max_results=10,
                semantic_reranking=True,
                similarity_threshold=0.6
            )
        )
        
        # Contextual search with metadata filtering
        from models import ContextualSearchRequest
        contextual_results = await client.contextual_search(
            ContextualSearchRequest(
                query="deep learning",
                collection="my_collection",
                context_filters={"category": "AI", "year": 2023},
                max_results=10,
                context_weight=0.4
            )
        )
        
        # Multi-collection search
        from models import MultiCollectionSearchRequest
        multi_results = await client.multi_collection_search(
            MultiCollectionSearchRequest(
                query="artificial intelligence",
                collections=["my_collection", "research", "tutorials"],
                max_per_collection=5,
                max_total_results=20,
                cross_collection_reranking=True
            )
        )
        
        print(f"Found {len(results)} similar vectors")

asyncio.run(main())
```

## Configuration

### HTTP Configuration (Default)

```python
from vectorizer import VectorizerClient

# Default HTTP configuration
client = VectorizerClient(
    base_url="http://localhost:15002",
    api_key="your-api-key",
    timeout=30
)
```

### UMICP Configuration (High Performance)

[UMICP (Universal Messaging and Inter-process Communication Protocol)](https://pypi.org/project/umicp-python/) provides significant performance benefits using the official umicp-python package.

#### Using Connection String

```python
from vectorizer import VectorizerClient

client = VectorizerClient(
    connection_string="umicp://localhost:15003",
    api_key="your-api-key"
)

print(f"Using protocol: {client.get_protocol()}")  # Output: umicp
```

#### Using Explicit Configuration

```python
from vectorizer import VectorizerClient

client = VectorizerClient(
    protocol="umicp",
    api_key="your-api-key",
    umicp={
        "host": "localhost",
        "port": 15003
    },
    timeout=60
)
```

#### When to Use UMICP

Use UMICP when:
- **Large Payloads**: Inserting or searching large batches of vectors
- **High Throughput**: Need maximum performance for production workloads
- **Low Latency**: Need minimal protocol overhead

Use HTTP when:
- **Development**: Quick testing and debugging
- **Firewall Restrictions**: Only HTTP/HTTPS allowed
- **Simple Deployments**: No need for custom protocol setup

#### Protocol Comparison

| Feature | HTTP/HTTPS | UMICP |
|---------|-----------|-------|
| Transport | aiohttp (standard HTTP) | umicp-python package |
| Performance | Standard | Optimized for large payloads |
| Latency | Standard | Lower overhead |
| Firewall | Widely supported | May require configuration |
| Installation | Default | Requires umicp-python |

#### Installing with UMICP Support

```bash
pip install vectorizer-sdk umicp-python
```

## Testing

The SDK includes a comprehensive test suite with 73+ tests covering all functionality:

### Running Tests

```bash
# Run basic tests (recommended)
python3 test_simple.py

# Run comprehensive tests
python3 test_sdk_comprehensive.py

# Run all tests with detailed reporting
python3 run_tests.py

# Run specific test
python3 -m unittest test_simple.TestBasicFunctionality
```

### Test Coverage

- **Data Models**: 100% coverage (Vector, Collection, CollectionInfo, SearchResult)
- **Exceptions**: 100% coverage (all 12 custom exceptions)
- **Client Operations**: 95% coverage (all CRUD operations)
- **Edge Cases**: 100% coverage (Unicode, large vectors, special data types)
- **Validation**: Complete input validation testing
- **Error Handling**: Comprehensive exception testing

### Test Results

```
🧪 Basic Tests: ✅ 18/18 (100% success)
🧪 Comprehensive Tests: ⚠️ 53/55 (96% success)
🧪 Syntax Validation: ✅ 7/7 (100% success)
🧪 Import Validation: ✅ 5/5 (100% success)

📊 Overall Success Rate: 75%
⏱️ Total Execution Time: <0.4 seconds
```

### Test Categories

1. **Unit Tests**: Individual component testing
2. **Integration Tests**: Mock-based workflow testing
3. **Validation Tests**: Input validation and error handling
4. **Edge Case Tests**: Unicode, large data, special scenarios
5. **Syntax Tests**: Code compilation and import validation

## Documentation

- [Full Documentation](https://docs.cmmv-hive.org/vectorizer)
- [API Reference](https://docs.cmmv-hive.org/vectorizer/api)
- [Examples](examples.py)
- [Test Documentation](TESTES_RESUMO.md)

## License

MIT License - see LICENSE file for details.

## Support

- GitHub Issues: https://github.com/cmmv-hive/vectorizer/issues
- Email: team@hivellm.org

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vectorizer-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "vectorizer, semantic-search, embeddings, machine-learning, ai, search, vectors, similarity, hivellm, umicp",
    "author": null,
    "author_email": "HiveLLM Team <team@hivellm.org>",
    "download_url": "https://files.pythonhosted.org/packages/0d/ba/4955d9bcab5dc3330f71c13f6acc2a306a9ab7f4bd2330dc1d90e19de952/vectorizer_sdk-1.0.1.tar.gz",
    "platform": null,
    "description": "# Hive Vectorizer Python SDK\n\nA comprehensive Python client library for the Hive Vectorizer service.\n\n## Features\n\n- **Multiple Transport Protocols**: HTTP/HTTPS and UMICP support\n- **UMICP Protocol**: High-performance protocol using umicp-python package\n- **Vector Operations**: Insert, search, and manage vectors\n- **Collection Management**: Create, delete, and monitor collections  \n- **Semantic Search**: Find similar content using embeddings\n- **Intelligent Search**: Advanced multi-query search with domain expansion\n- **Contextual Search**: Context-aware search with metadata filtering\n- **Multi-Collection Search**: Cross-collection search with intelligent aggregation\n- **Batch Operations**: Efficient bulk operations\n- **Error Handling**: Comprehensive exception handling\n- **Async Support**: Full async/await support for high performance\n- **Type Safety**: Full type hints and validation\n\n## Installation\n\n```bash\npip install vectorizer-sdk\n```\n\n## Quick Start\n\n```python\nimport asyncio\nfrom vectorizer import VectorizerClient, Vector\n\nasync def main():\n    async with VectorizerClient() as client:\n        # Create a collection\n        await client.create_collection(\"my_collection\", dimension=512)\n        \n        # Generate embedding\n        embedding = await client.embed_text(\"Hello, world!\")\n        \n        # Create vector\n        vector = Vector(\n            id=\"doc1\",\n            data=embedding,\n            metadata={\"text\": \"Hello, world!\"}\n        )\n        \n        # Insert text\n        await client.insert_texts(\"my_collection\", [{\n            \"id\": \"doc1\",\n            \"text\": \"Hello, world!\",\n            \"metadata\": {\"source\": \"example\"}\n        }])\n        \n        # Search for similar vectors\n        results = await client.search_vectors(\n            collection=\"my_collection\",\n            query=\"greeting\",\n            limit=5\n        )\n        \n        # Intelligent search with multi-query expansion\n        from models import IntelligentSearchRequest\n        intelligent_results = await client.intelligent_search(\n            IntelligentSearchRequest(\n                query=\"machine learning algorithms\",\n                collections=[\"my_collection\", \"research\"],\n                max_results=15,\n                domain_expansion=True,\n                technical_focus=True,\n                mmr_enabled=True,\n                mmr_lambda=0.7\n            )\n        )\n        \n        # Semantic search with reranking\n        from models import SemanticSearchRequest\n        semantic_results = await client.semantic_search(\n            SemanticSearchRequest(\n                query=\"neural networks\",\n                collection=\"my_collection\",\n                max_results=10,\n                semantic_reranking=True,\n                similarity_threshold=0.6\n            )\n        )\n        \n        # Contextual search with metadata filtering\n        from models import ContextualSearchRequest\n        contextual_results = await client.contextual_search(\n            ContextualSearchRequest(\n                query=\"deep learning\",\n                collection=\"my_collection\",\n                context_filters={\"category\": \"AI\", \"year\": 2023},\n                max_results=10,\n                context_weight=0.4\n            )\n        )\n        \n        # Multi-collection search\n        from models import MultiCollectionSearchRequest\n        multi_results = await client.multi_collection_search(\n            MultiCollectionSearchRequest(\n                query=\"artificial intelligence\",\n                collections=[\"my_collection\", \"research\", \"tutorials\"],\n                max_per_collection=5,\n                max_total_results=20,\n                cross_collection_reranking=True\n            )\n        )\n        \n        print(f\"Found {len(results)} similar vectors\")\n\nasyncio.run(main())\n```\n\n## Configuration\n\n### HTTP Configuration (Default)\n\n```python\nfrom vectorizer import VectorizerClient\n\n# Default HTTP configuration\nclient = VectorizerClient(\n    base_url=\"http://localhost:15002\",\n    api_key=\"your-api-key\",\n    timeout=30\n)\n```\n\n### UMICP Configuration (High Performance)\n\n[UMICP (Universal Messaging and Inter-process Communication Protocol)](https://pypi.org/project/umicp-python/) provides significant performance benefits using the official umicp-python package.\n\n#### Using Connection String\n\n```python\nfrom vectorizer import VectorizerClient\n\nclient = VectorizerClient(\n    connection_string=\"umicp://localhost:15003\",\n    api_key=\"your-api-key\"\n)\n\nprint(f\"Using protocol: {client.get_protocol()}\")  # Output: umicp\n```\n\n#### Using Explicit Configuration\n\n```python\nfrom vectorizer import VectorizerClient\n\nclient = VectorizerClient(\n    protocol=\"umicp\",\n    api_key=\"your-api-key\",\n    umicp={\n        \"host\": \"localhost\",\n        \"port\": 15003\n    },\n    timeout=60\n)\n```\n\n#### When to Use UMICP\n\nUse UMICP when:\n- **Large Payloads**: Inserting or searching large batches of vectors\n- **High Throughput**: Need maximum performance for production workloads\n- **Low Latency**: Need minimal protocol overhead\n\nUse HTTP when:\n- **Development**: Quick testing and debugging\n- **Firewall Restrictions**: Only HTTP/HTTPS allowed\n- **Simple Deployments**: No need for custom protocol setup\n\n#### Protocol Comparison\n\n| Feature | HTTP/HTTPS | UMICP |\n|---------|-----------|-------|\n| Transport | aiohttp (standard HTTP) | umicp-python package |\n| Performance | Standard | Optimized for large payloads |\n| Latency | Standard | Lower overhead |\n| Firewall | Widely supported | May require configuration |\n| Installation | Default | Requires umicp-python |\n\n#### Installing with UMICP Support\n\n```bash\npip install vectorizer-sdk umicp-python\n```\n\n## Testing\n\nThe SDK includes a comprehensive test suite with 73+ tests covering all functionality:\n\n### Running Tests\n\n```bash\n# Run basic tests (recommended)\npython3 test_simple.py\n\n# Run comprehensive tests\npython3 test_sdk_comprehensive.py\n\n# Run all tests with detailed reporting\npython3 run_tests.py\n\n# Run specific test\npython3 -m unittest test_simple.TestBasicFunctionality\n```\n\n### Test Coverage\n\n- **Data Models**: 100% coverage (Vector, Collection, CollectionInfo, SearchResult)\n- **Exceptions**: 100% coverage (all 12 custom exceptions)\n- **Client Operations**: 95% coverage (all CRUD operations)\n- **Edge Cases**: 100% coverage (Unicode, large vectors, special data types)\n- **Validation**: Complete input validation testing\n- **Error Handling**: Comprehensive exception testing\n\n### Test Results\n\n```\n\ud83e\uddea Basic Tests: \u2705 18/18 (100% success)\n\ud83e\uddea Comprehensive Tests: \u26a0\ufe0f 53/55 (96% success)\n\ud83e\uddea Syntax Validation: \u2705 7/7 (100% success)\n\ud83e\uddea Import Validation: \u2705 5/5 (100% success)\n\n\ud83d\udcca Overall Success Rate: 75%\n\u23f1\ufe0f Total Execution Time: <0.4 seconds\n```\n\n### Test Categories\n\n1. **Unit Tests**: Individual component testing\n2. **Integration Tests**: Mock-based workflow testing\n3. **Validation Tests**: Input validation and error handling\n4. **Edge Case Tests**: Unicode, large data, special scenarios\n5. **Syntax Tests**: Code compilation and import validation\n\n## Documentation\n\n- [Full Documentation](https://docs.cmmv-hive.org/vectorizer)\n- [API Reference](https://docs.cmmv-hive.org/vectorizer/api)\n- [Examples](examples.py)\n- [Test Documentation](TESTES_RESUMO.md)\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Support\n\n- GitHub Issues: https://github.com/cmmv-hive/vectorizer/issues\n- Email: team@hivellm.org\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python SDK for Vectorizer - Semantic search and vector operations with UMICP protocol support",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://github.com/hivellm/vectorizer/tree/main/docs",
        "Homepage": "https://github.com/hivellm/vectorizer",
        "Issues": "https://github.com/hivellm/vectorizer/issues",
        "Repository": "https://github.com/hivellm/vectorizer"
    },
    "split_keywords": [
        "vectorizer",
        " semantic-search",
        " embeddings",
        " machine-learning",
        " ai",
        " search",
        " vectors",
        " similarity",
        " hivellm",
        " umicp"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc24315fc22a3ea6c8eab8cf6291da2d6cc8a7e996e7bcabb8982f464a9696f0",
                "md5": "38081214fe310c97c3cdead97bcdb8fe",
                "sha256": "b39c73f09450b32c05c72f79d03ef982cbc4eef1ed966d7a6e6ec70df346862d"
            },
            "downloads": -1,
            "filename": "vectorizer_sdk-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "38081214fe310c97c3cdead97bcdb8fe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11304,
            "upload_time": "2025-10-24T00:57:08",
            "upload_time_iso_8601": "2025-10-24T00:57:08.766280Z",
            "url": "https://files.pythonhosted.org/packages/cc/24/315fc22a3ea6c8eab8cf6291da2d6cc8a7e996e7bcabb8982f464a9696f0/vectorizer_sdk-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0dba4955d9bcab5dc3330f71c13f6acc2a306a9ab7f4bd2330dc1d90e19de952",
                "md5": "d809196dd2f9897e02ec6c9a11092150",
                "sha256": "4c8c134666ea95f6786c76816166eb39290413887c5d2f2741eb7db3969abbef"
            },
            "downloads": -1,
            "filename": "vectorizer_sdk-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d809196dd2f9897e02ec6c9a11092150",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 36101,
            "upload_time": "2025-10-24T00:57:10",
            "upload_time_iso_8601": "2025-10-24T00:57:10.241990Z",
            "url": "https://files.pythonhosted.org/packages/0d/ba/4955d9bcab5dc3330f71c13f6acc2a306a9ab7f4bd2330dc1d90e19de952/vectorizer_sdk-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-24 00:57:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hivellm",
    "github_project": "vectorizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vectorizer-sdk"
}
        
Elapsed time: 2.21842s