nseekfs


Namenseekfs JSON
Version 1.0.2 PyPI version JSON
download
home_pageNone
SummaryHigh-performance exact vector similarity search with Rust backend
upload_time2025-09-07 09:44:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords vector similarity search rust machine-learning embeddings
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NSeekFS

[![PyPI version](https://badge.fury.io/py/nseekfs.svg)](https://pypi.org/project/nseekfs)
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://python.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**High-Performance Exact Vector Search with Rust Backend**

Fast and exact cosine similarity search for Python. Built with Rust for performance, designed for production use.

```bash
pip install nseekfs
```

## Quick Start

```python
import nseekfs
import numpy as np

# Create some test vectors
embeddings = np.random.randn(10000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

# Build index and run a search
index = nseekfs.from_embeddings(embeddings, normalized=True)
results = index.query(query, top_k=10)

print(f"Found {len(results)} results")
print(f"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}")
```

## Core Features

### Exact Search

```python
# Basic query
results = index.query(query, top_k=10)

# Access results
for item in results:
    print(f"Vector {item['idx']}: {item['score']:.6f}")
```

### Batch Queries

```python
queries = np.random.randn(50, 384).astype(np.float32)
batch_results = index.query_batch(queries, top_k=5)
print(f"Processed {len(batch_results)} queries")
```

### Query Options

```python
# Simple query (alias for query with format="simple")
results = index.query_simple(query, top_k=10)

# Detailed query with timing and diagnostics
result = index.query_detailed(query, top_k=10)
print(f"Query took {result.query_time_ms:.2f} ms, top1 idx={result.results[0]['idx']}")
```

### Index Persistence

```python
# Build and save index
index = nseekfs.from_embeddings(embeddings, normalized=True)
print("Index saved at:", index.index_path)

# Later, reload from file
index2 = nseekfs.from_bin(index.index_path)
print(f"Reloaded index: {index2.rows} vectors x {index2.dims} dims")
```

### Performance Metrics

```python
metrics = index.get_performance_metrics()
print(f"Total queries: {metrics['total_queries']}")
print(f"Average time: {metrics['avg_query_time_ms']:.2f} ms")
```

### Built-in Benchmark

```python
nseekfs.benchmark(vectors=1000, dims=384, queries=100, verbose=True)
```

## API Reference

### Index

* `from_embeddings(embeddings, normalized=True, verbose=False)`
* `from_bin(path)`

### Queries

* `query(query_vector, top_k=10)`
* `query_simple(query_vector, top_k=10)`
* `query_detailed(query_vector, top_k=10)`
* `query_batch(queries, top_k=10)`

### Properties

* `index.rows`
* `index.dims`
* `index.config`

### Utilities

* `get_performance_metrics()`
* `benchmark(vectors=..., dims=..., queries=...)`

## Architecture Highlights

### SIMD Optimizations
- AVX2 support for 8x parallelism on compatible CPUs
- Automatic fallback to scalar operations on older hardware  
- Runtime detection of CPU capabilities

### Memory Management
- Memory mapping for efficient data access
- Thread-local buffers for zero-allocation queries
- Cache-aligned data structures for optimal performance

### Batch Processing
- Intelligent batching strategies based on query size
- SIMD vectorization across multiple queries
- Optimized memory access patterns

## Installation

```bash
# From PyPI
pip install nseekfs

# Verify installation
python -c "import nseekfs; print('NSeekFS installed successfully')"
```

## Technical Details

- **Precision**: Float32 optimized for standard ML embeddings
- **Memory**: Efficient memory usage with optimized data structures
- **Performance**: Rust backend with SIMD optimizations where available
- **Compatibility**: Python 3.8+ on Windows, macOS, and Linux
- **Thread Safety**: Safe concurrent access from multiple threads

## Performance Tips

```python
# Pre-normalize vectors if using cosine similarity
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
index = nseekfs.from_embeddings(embeddings, normalized=True)

# Use appropriate data types
embeddings = embeddings.astype(np.float32)

# Choose optimal top_k values
results = index.query(query, top_k=10)  # vs top_k=1000

# Use batch processing for multiple queries
batch_results = index.query_batch(queries, top_k=10)
```

## License

MIT License - see LICENSE file for details.

---

**Fast, exact cosine similarity search for Python.**

*Built with Rust for performance, designed for Python developers.*

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nseekfs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Diogo Novo <contact@nseek.io>",
    "keywords": "vector, similarity, search, rust, machine-learning, embeddings",
    "author": null,
    "author_email": "Diogo Novo <contact@nseek.io>",
    "download_url": "https://files.pythonhosted.org/packages/51/4e/e12a0e45869035336583808990a7f0abb992915a48432dcefee1ee7dfd1c/nseekfs-1.0.2.tar.gz",
    "platform": null,
    "description": "# NSeekFS\n\n[![PyPI version](https://badge.fury.io/py/nseekfs.svg)](https://pypi.org/project/nseekfs)\n[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://python.org)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**High-Performance Exact Vector Search with Rust Backend**\n\nFast and exact cosine similarity search for Python. Built with Rust for performance, designed for production use.\n\n```bash\npip install nseekfs\n```\n\n## Quick Start\n\n```python\nimport nseekfs\nimport numpy as np\n\n# Create some test vectors\nembeddings = np.random.randn(10000, 384).astype(np.float32)\nquery = np.random.randn(384).astype(np.float32)\n\n# Build index and run a search\nindex = nseekfs.from_embeddings(embeddings, normalized=True)\nresults = index.query(query, top_k=10)\n\nprint(f\"Found {len(results)} results\")\nprint(f\"Best match: idx={results[0]['idx']} score={results[0]['score']:.3f}\")\n```\n\n## Core Features\n\n### Exact Search\n\n```python\n# Basic query\nresults = index.query(query, top_k=10)\n\n# Access results\nfor item in results:\n    print(f\"Vector {item['idx']}: {item['score']:.6f}\")\n```\n\n### Batch Queries\n\n```python\nqueries = np.random.randn(50, 384).astype(np.float32)\nbatch_results = index.query_batch(queries, top_k=5)\nprint(f\"Processed {len(batch_results)} queries\")\n```\n\n### Query Options\n\n```python\n# Simple query (alias for query with format=\"simple\")\nresults = index.query_simple(query, top_k=10)\n\n# Detailed query with timing and diagnostics\nresult = index.query_detailed(query, top_k=10)\nprint(f\"Query took {result.query_time_ms:.2f} ms, top1 idx={result.results[0]['idx']}\")\n```\n\n### Index Persistence\n\n```python\n# Build and save index\nindex = nseekfs.from_embeddings(embeddings, normalized=True)\nprint(\"Index saved at:\", index.index_path)\n\n# Later, reload from file\nindex2 = nseekfs.from_bin(index.index_path)\nprint(f\"Reloaded index: {index2.rows} vectors x {index2.dims} dims\")\n```\n\n### Performance Metrics\n\n```python\nmetrics = index.get_performance_metrics()\nprint(f\"Total queries: {metrics['total_queries']}\")\nprint(f\"Average time: {metrics['avg_query_time_ms']:.2f} ms\")\n```\n\n### Built-in Benchmark\n\n```python\nnseekfs.benchmark(vectors=1000, dims=384, queries=100, verbose=True)\n```\n\n## API Reference\n\n### Index\n\n* `from_embeddings(embeddings, normalized=True, verbose=False)`\n* `from_bin(path)`\n\n### Queries\n\n* `query(query_vector, top_k=10)`\n* `query_simple(query_vector, top_k=10)`\n* `query_detailed(query_vector, top_k=10)`\n* `query_batch(queries, top_k=10)`\n\n### Properties\n\n* `index.rows`\n* `index.dims`\n* `index.config`\n\n### Utilities\n\n* `get_performance_metrics()`\n* `benchmark(vectors=..., dims=..., queries=...)`\n\n## Architecture Highlights\n\n### SIMD Optimizations\n- AVX2 support for 8x parallelism on compatible CPUs\n- Automatic fallback to scalar operations on older hardware  \n- Runtime detection of CPU capabilities\n\n### Memory Management\n- Memory mapping for efficient data access\n- Thread-local buffers for zero-allocation queries\n- Cache-aligned data structures for optimal performance\n\n### Batch Processing\n- Intelligent batching strategies based on query size\n- SIMD vectorization across multiple queries\n- Optimized memory access patterns\n\n## Installation\n\n```bash\n# From PyPI\npip install nseekfs\n\n# Verify installation\npython -c \"import nseekfs; print('NSeekFS installed successfully')\"\n```\n\n## Technical Details\n\n- **Precision**: Float32 optimized for standard ML embeddings\n- **Memory**: Efficient memory usage with optimized data structures\n- **Performance**: Rust backend with SIMD optimizations where available\n- **Compatibility**: Python 3.8+ on Windows, macOS, and Linux\n- **Thread Safety**: Safe concurrent access from multiple threads\n\n## Performance Tips\n\n```python\n# Pre-normalize vectors if using cosine similarity\nembeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)\nindex = nseekfs.from_embeddings(embeddings, normalized=True)\n\n# Use appropriate data types\nembeddings = embeddings.astype(np.float32)\n\n# Choose optimal top_k values\nresults = index.query(query, top_k=10)  # vs top_k=1000\n\n# Use batch processing for multiple queries\nbatch_results = index.query_batch(queries, top_k=10)\n```\n\n## License\n\nMIT License - see LICENSE file for details.\n\n---\n\n**Fast, exact cosine similarity search for Python.**\n\n*Built with Rust for performance, designed for Python developers.*\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance exact vector similarity search with Rust backend",
    "version": "1.0.2",
    "project_urls": {
        "Documentation": "https://github.com/NSeek-AI/nseekfs/wiki",
        "Homepage": "https://github.com/NSeek-AI/nseekfs",
        "Repository": "https://github.com/NSeek-AI/nseekfs.git"
    },
    "split_keywords": [
        "vector",
        " similarity",
        " search",
        " rust",
        " machine-learning",
        " embeddings"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "985c656d8bd385de5a8134a1fdfccaaf87144d0cb5918050e8d161dfb4c9d48e",
                "md5": "7aebee3b00e023b10da3bd94515aaa5e",
                "sha256": "2abe662f51357063a12e5e308ff9604c736bab1aca020a2170088a53431fc317"
            },
            "downloads": -1,
            "filename": "nseekfs-1.0.2-cp38-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7aebee3b00e023b10da3bd94515aaa5e",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 301925,
            "upload_time": "2025-09-07T09:44:03",
            "upload_time_iso_8601": "2025-09-07T09:44:03.365472Z",
            "url": "https://files.pythonhosted.org/packages/98/5c/656d8bd385de5a8134a1fdfccaaf87144d0cb5918050e8d161dfb4c9d48e/nseekfs-1.0.2-cp38-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ef1b70eb20852059779e31b803a08429d6fef4e85850e9d6617fbbcb922fbccd",
                "md5": "09b980c68e927c34a77000129747042b",
                "sha256": "543afe5e6479241399618c456a51014a60c7262296fdabb29594201d37039c6d"
            },
            "downloads": -1,
            "filename": "nseekfs-1.0.2-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "09b980c68e927c34a77000129747042b",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 278390,
            "upload_time": "2025-09-07T09:44:05",
            "upload_time_iso_8601": "2025-09-07T09:44:05.110884Z",
            "url": "https://files.pythonhosted.org/packages/ef/1b/70eb20852059779e31b803a08429d6fef4e85850e9d6617fbbcb922fbccd/nseekfs-1.0.2-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d285e267b195b654fa2e278f03fcfbfeaa6a401682b8fc31ad6855e86dc2adeb",
                "md5": "cd2c3ab4eb55718ef4c7ed7178093273",
                "sha256": "58f2eb7c2c8dab8da71b5ded71ea207065da302ff71c70d592aa39f470e221d6"
            },
            "downloads": -1,
            "filename": "nseekfs-1.0.2-cp38-abi3-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "cd2c3ab4eb55718ef4c7ed7178093273",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 351905,
            "upload_time": "2025-09-07T09:44:06",
            "upload_time_iso_8601": "2025-09-07T09:44:06.141976Z",
            "url": "https://files.pythonhosted.org/packages/d2/85/e267b195b654fa2e278f03fcfbfeaa6a401682b8fc31ad6855e86dc2adeb/nseekfs-1.0.2-cp38-abi3-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ce0c06d62ffec0bdae5c51c17308c6d1609692afa467117b616188959ad1ebc",
                "md5": "1bb438f2318204328adb81c2f2e8798d",
                "sha256": "cc8fd265f7e1f0d65cde68402ac6381ac57ec3eccb5cdfd92de0e1108ad53458"
            },
            "downloads": -1,
            "filename": "nseekfs-1.0.2-cp38-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "1bb438f2318204328adb81c2f2e8798d",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 230851,
            "upload_time": "2025-09-07T09:44:07",
            "upload_time_iso_8601": "2025-09-07T09:44:07.661548Z",
            "url": "https://files.pythonhosted.org/packages/0c/e0/c06d62ffec0bdae5c51c17308c6d1609692afa467117b616188959ad1ebc/nseekfs-1.0.2-cp38-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "514ee12a0e45869035336583808990a7f0abb992915a48432dcefee1ee7dfd1c",
                "md5": "15dbead451b0a9e2bedf4e8cf10cfcdc",
                "sha256": "82c2664545a19c191a75000fc18205bea6f142f7f7f43be6dc089e0b144e2d06"
            },
            "downloads": -1,
            "filename": "nseekfs-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "15dbead451b0a9e2bedf4e8cf10cfcdc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 86845,
            "upload_time": "2025-09-07T09:44:08",
            "upload_time_iso_8601": "2025-09-07T09:44:08.576051Z",
            "url": "https://files.pythonhosted.org/packages/51/4e/e12a0e45869035336583808990a7f0abb992915a48432dcefee1ee7dfd1c/nseekfs-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-07 09:44:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NSeek-AI",
    "github_project": "nseekfs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "nseekfs"
}
        
Elapsed time: 2.35617s