Dimensia


NameDimensia JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/aniruddhasalve/dimensia/
SummaryA custom vector storage and search solution
upload_time2024-11-21 06:54:45
maintainerNone
docs_urlNone
authorAniruddha Salve
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements certifi charset-normalizer filelock fsspec huggingface-hub idna Jinja2 joblib MarkupSafe mpmath networkx numpy packaging pillow PyYAML regex requests safetensors scikit-learn scipy sentence-transformers sympy threadpoolctl tokenizers torch tqdm transformers typing_extensions urllib3
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dimensia

`Dimensia` is a high-performance vector database designed for efficient semantic search and storage of vector embeddings. It supports adding documents, performing searches, and managing collections using customizable embedding models. Dimensia is ideal for use cases like information retrieval, recommendation systems, and other machine learning tasks that require fast and efficient access to high-dimensional vector data.

## Features

- **Collections**: Create and manage multiple collections of documents with associated metadata.
- **Similarity Search**: Perform semantic search to find the most similar documents in a collection.
- **Document Management**: Add, retrieve, and manage documents by ID within collections.
- **Embedding Model Support**: Easily integrate with models from `sentence-transformers` for generating vector embeddings.
- **Efficient Indexing**: Uses HNSW (Hierarchical Navigable Small World) index for fast nearest-neighbor search.

## Installation

To install Dimensia, simply run the following command:
```bash
pip install dimensia
```

### Usage

```python

from dimensia import Dimensia

# Initialize the database
db = Dimensia(db_path="dimensia_db")

# Set the embedding model
db.set_embedding_model("sentence-transformers/paraphrase-MiniLM-L6-v2")
print("Embedding model set successfully.")

# Create collections
db.create_collection("collection_1", metadata_schema={"field1": "type1", "field2": "type2"})
db.create_collection("collection_2", metadata_schema={"field1": "type1", "field2": "type2"})
print("Collections created successfully.")

# Verify collections
collections = db.get_collections()
print(f"Collections: {collections}")

# Add documents to the collections
documents_1 = [
    {"id": "1", "content": "This is a document about deep learning."},
    {"id": "2", "content": "This document covers natural language processing."}
]

documents_2 = [
    {"id": "3", "content": "This document is about reinforcement learning."},
    {"id": "4", "content": "This document discusses machine learning in general."}
]

db.add_documents("collection_1", documents_1)
db.add_documents("collection_2", documents_2)
print("Documents added successfully.")

# Perform searches in collections
print("\nPerforming search in Collection 1:")
query_1 = "Tell me about NLP"
results_1 = db.search(query_1, "collection_1", top_k=2)
for result in results_1:
    print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")

print("\nPerforming search in Collection 2:")
query_2 = "What is reinforcement learning?"
results_2 = db.search(query_2, "collection_2", top_k=2)
for result in results_2:
    print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")

# Retrieve collection schema
schema_1 = db.get_collection_schema("collection_1")
print(f"Schema for Collection 1: {schema_1}")

# Retrieve a document by ID
doc_1 = db.get_document("collection_1", "1")
print(f"Retrieved Document from Collection 1: {doc_1}")

# Get vector size (dimension of the embedding)
vector_size = db.get_vector_size()
print(f"Vector size: {vector_size}")

```
### Requirements
`Dimensia` requires the following dependencies:
- **`numpy==1.26.4`**
- **`torch==2.2.2`**
- **`sentence-transformers==3.3.1`**

## Contributing

We welcome contributions to improve Dimensia! Please fork the repository, make your changes, and submit a pull request.

## Support

If you encounter any issues or have questions, please don't hesitate to open an issue on our [GitHub repository](https://github.com/aniruddhasalve/dimensia/). We welcome feedback, bug reports, and feature requests!

We strive to respond as quickly as possible to all issues and questions.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aniruddhasalve/dimensia/",
    "name": "Dimensia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Aniruddha Salve",
    "author_email": "salveaniruddha180@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d2/89/27bd4b8340de95e60c77b8a42d434190cccf5e1d232db5918c4f6eb2126c/dimensia-0.1.1.tar.gz",
    "platform": null,
    "description": "# Dimensia\r\n\r\n`Dimensia` is a high-performance vector database designed for efficient semantic search and storage of vector embeddings. It supports adding documents, performing searches, and managing collections using customizable embedding models. Dimensia is ideal for use cases like information retrieval, recommendation systems, and other machine learning tasks that require fast and efficient access to high-dimensional vector data.\r\n\r\n## Features\r\n\r\n- **Collections**: Create and manage multiple collections of documents with associated metadata.\r\n- **Similarity Search**: Perform semantic search to find the most similar documents in a collection.\r\n- **Document Management**: Add, retrieve, and manage documents by ID within collections.\r\n- **Embedding Model Support**: Easily integrate with models from `sentence-transformers` for generating vector embeddings.\r\n- **Efficient Indexing**: Uses HNSW (Hierarchical Navigable Small World) index for fast nearest-neighbor search.\r\n\r\n## Installation\r\n\r\nTo install Dimensia, simply run the following command:\r\n```bash\r\npip install dimensia\r\n```\r\n\r\n### Usage\r\n\r\n```python\r\n\r\nfrom dimensia import Dimensia\r\n\r\n# Initialize the database\r\ndb = Dimensia(db_path=\"dimensia_db\")\r\n\r\n# Set the embedding model\r\ndb.set_embedding_model(\"sentence-transformers/paraphrase-MiniLM-L6-v2\")\r\nprint(\"Embedding model set successfully.\")\r\n\r\n# Create collections\r\ndb.create_collection(\"collection_1\", metadata_schema={\"field1\": \"type1\", \"field2\": \"type2\"})\r\ndb.create_collection(\"collection_2\", metadata_schema={\"field1\": \"type1\", \"field2\": \"type2\"})\r\nprint(\"Collections created successfully.\")\r\n\r\n# Verify collections\r\ncollections = db.get_collections()\r\nprint(f\"Collections: {collections}\")\r\n\r\n# Add documents to the collections\r\ndocuments_1 = [\r\n    {\"id\": \"1\", \"content\": \"This is a document about deep learning.\"},\r\n    {\"id\": \"2\", \"content\": \"This document covers natural language processing.\"}\r\n]\r\n\r\ndocuments_2 = [\r\n    {\"id\": \"3\", \"content\": \"This document is about reinforcement learning.\"},\r\n    {\"id\": \"4\", \"content\": \"This document discusses machine learning in general.\"}\r\n]\r\n\r\ndb.add_documents(\"collection_1\", documents_1)\r\ndb.add_documents(\"collection_2\", documents_2)\r\nprint(\"Documents added successfully.\")\r\n\r\n# Perform searches in collections\r\nprint(\"\\nPerforming search in Collection 1:\")\r\nquery_1 = \"Tell me about NLP\"\r\nresults_1 = db.search(query_1, \"collection_1\", top_k=2)\r\nfor result in results_1:\r\n    print(f\"Document ID: {result['document']['id']}, Similarity: {result['score']}\")\r\n\r\nprint(\"\\nPerforming search in Collection 2:\")\r\nquery_2 = \"What is reinforcement learning?\"\r\nresults_2 = db.search(query_2, \"collection_2\", top_k=2)\r\nfor result in results_2:\r\n    print(f\"Document ID: {result['document']['id']}, Similarity: {result['score']}\")\r\n\r\n# Retrieve collection schema\r\nschema_1 = db.get_collection_schema(\"collection_1\")\r\nprint(f\"Schema for Collection 1: {schema_1}\")\r\n\r\n# Retrieve a document by ID\r\ndoc_1 = db.get_document(\"collection_1\", \"1\")\r\nprint(f\"Retrieved Document from Collection 1: {doc_1}\")\r\n\r\n# Get vector size (dimension of the embedding)\r\nvector_size = db.get_vector_size()\r\nprint(f\"Vector size: {vector_size}\")\r\n\r\n```\r\n### Requirements\r\n`Dimensia` requires the following dependencies:\r\n- **`numpy==1.26.4`**\r\n- **`torch==2.2.2`**\r\n- **`sentence-transformers==3.3.1`**\r\n\r\n## Contributing\r\n\r\nWe welcome contributions to improve Dimensia! Please fork the repository, make your changes, and submit a pull request.\r\n\r\n## Support\r\n\r\nIf you encounter any issues or have questions, please don't hesitate to open an issue on our [GitHub repository](https://github.com/aniruddhasalve/dimensia/). We welcome feedback, bug reports, and feature requests!\r\n\r\nWe strive to respond as quickly as possible to all issues and questions.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A custom vector storage and search solution",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/aniruddhasalve/dimensia/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e80d6b9a93a6c85ba45f6b9aaa16e8ee9acca216f974c2fccfc38115924e83e3",
                "md5": "4b63a147d96c6a275c448fb92a1aa0f9",
                "sha256": "c4d3ba9f43ffa0af0f57bdd66c5edbe3ff7b6466c581be1f96e320ee893586ad"
            },
            "downloads": -1,
            "filename": "Dimensia-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4b63a147d96c6a275c448fb92a1aa0f9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9453,
            "upload_time": "2024-11-21T06:54:42",
            "upload_time_iso_8601": "2024-11-21T06:54:42.870759Z",
            "url": "https://files.pythonhosted.org/packages/e8/0d/6b9a93a6c85ba45f6b9aaa16e8ee9acca216f974c2fccfc38115924e83e3/Dimensia-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d28927bd4b8340de95e60c77b8a42d434190cccf5e1d232db5918c4f6eb2126c",
                "md5": "99128aaacbaf3f31284d78e47aa6706d",
                "sha256": "83d6ce68596ddb4459380fe22cccc1beb993352fbf5831edf616995faee53693"
            },
            "downloads": -1,
            "filename": "dimensia-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "99128aaacbaf3f31284d78e47aa6706d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8465,
            "upload_time": "2024-11-21T06:54:45",
            "upload_time_iso_8601": "2024-11-21T06:54:45.767659Z",
            "url": "https://files.pythonhosted.org/packages/d2/89/27bd4b8340de95e60c77b8a42d434190cccf5e1d232db5918c4f6eb2126c/dimensia-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-21 06:54:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aniruddhasalve",
    "github_project": "dimensia",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.8.30"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "filelock",
            "specs": [
                [
                    "==",
                    "3.16.1"
                ]
            ]
        },
        {
            "name": "fsspec",
            "specs": [
                [
                    "==",
                    "2024.10.0"
                ]
            ]
        },
        {
            "name": "huggingface-hub",
            "specs": [
                [
                    "==",
                    "0.26.2"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.10"
                ]
            ]
        },
        {
            "name": "Jinja2",
            "specs": [
                [
                    "==",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "MarkupSafe",
            "specs": [
                [
                    "==",
                    "3.0.2"
                ]
            ]
        },
        {
            "name": "mpmath",
            "specs": [
                [
                    "==",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "3.4.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.2"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "11.0.0"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.11.6"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "safetensors",
            "specs": [
                [
                    "==",
                    "0.4.5"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.14.1"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    "==",
                    "3.3.1"
                ]
            ]
        },
        {
            "name": "sympy",
            "specs": [
                [
                    "==",
                    "1.13.3"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    "==",
                    "0.20.3"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.67.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.46.3"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        }
    ],
    "lcname": "dimensia"
}
        
Elapsed time: 0.35481s