chromadb-semantic


Namechromadb-semantic JSON
Version 1.0.3 PyPI version JSON
download
home_page
Summary
upload_time2023-06-03 16:51:17
maintainer
docs_urlNone
author
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ChromaDB Semantic Search
```
🐔動機: 
. 高速語境探索資料庫,自然語言搜尋
. 將「語境探索」原本散亂的文字向量 .pt 檔案群升級為 duckdb+parquet 向量資料庫,並加入高速索引

💣地雷: 
. python 3.11無法編譯通過
. python 3.10測試成功

本專案使用 [ChromaDB](https://github.com/chroma-core/chroma) 以及 [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) 來實現中文與英文的語意搜尋。請按照以下的步驟進行安裝和使用。
```

## 安裝
1. 先確保您已安裝 Python 3.10。
2. 運行以下指令安裝必要的套件:

```bash
python -m pip install chromadb-semantic
```

## 使用方法
1. 將 `chromadb_semantic.py` 檔案放入你的專案資料夾中。
2. 在你的專案中引入 `chromadb_semantic.py` 並使用以下函式:

### `create_chromadb_client()`
建立一個 ChromaDB 客戶端,資料會儲存在 `./data` 資料夾中。

### `create_collection(client, collection_name)`
- `client`: 需要傳入由 `create_chromadb_client()` 返回的客戶端物件。
- `collection_name`: 使用您希望建立的集合名稱。
建立一個新的語意搜尋集合。

### `add_documents_to_collection(collection, documents, metadatas, ids)`
- `collection`: 需要傳入由 `create_collection()` 返回的集合物件。
- `documents`: 文檔列表,內容可以是中英文,必須與 `metadatas` 和 `ids` 一一對應。
- `metadatas`: 元數據列表。
- `ids`: 文檔標識列表。
向集合中添加文檔。

### `query_collection(collection, query_text, n_results)`
- `collection`: 需要傳入由 `create_collection()` 返回的集合物件。
- `query_text`: 查詢語句。
- `n_results`: 從此查詢中要返回的結果數。
通過查詢語句,在集合中執行語意搜索。

## 示例
以下是一個使用 `chromadb_semantic.py` 的範例:

```python
from chromadb_semantic import *

client = create_chromadb_client()

collection_name = "my_sentence_transformer_collection"
collection = create_collection(client, collection_name)

documents = ["This is a sample document.", "玉里鎮的月亮比較大顆", "This is a test document."]
ids = ["doc1", "doc2", "doc3"]
metadatas = [{"type": "sample"}, {"type": "example"}, {"type": "test"}]

add_documents_to_collection(collection, documents, metadatas, ids)

query_text = "哪一個鄉鎮的月亮比較圓"
results = query_collection(collection, query_text, n_results=2)

print(f"Results for query: {query_text}")
print(results)
```

輸出示例:
```
Results for query: 哪一個鄉鎮的月亮比較圓
{'ids': [['doc2', 'doc1']], 'embeddings': None, 'documents': [['玉里鎮的月亮比較大顆', 'This is a sample docummetadatas': [[{'type': 'example'}, {'type': 'sample'}]], 'distances': [[5.628111839294434, 43.863502502441406]]}
```

注意:第一次創建集合和添加文檔時,嵌入運算可能需要一些時間。之後的查詢將會更快。

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "chromadb-semantic",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/ae/17/31be8543f9be6bd246463a8d412efd79a518682e9313ad20414d84d8f733/chromadb-semantic-1.0.3.tar.gz",
    "platform": null,
    "description": "# ChromaDB Semantic Search\n```\n\ud83d\udc14\u52d5\u6a5f: \n. \u9ad8\u901f\u8a9e\u5883\u63a2\u7d22\u8cc7\u6599\u5eab\uff0c\u81ea\u7136\u8a9e\u8a00\u641c\u5c0b\n. \u5c07\u300c\u8a9e\u5883\u63a2\u7d22\u300d\u539f\u672c\u6563\u4e82\u7684\u6587\u5b57\u5411\u91cf .pt \u6a94\u6848\u7fa4\u5347\u7d1a\u70ba duckdb+parquet \u5411\u91cf\u8cc7\u6599\u5eab\uff0c\u4e26\u52a0\u5165\u9ad8\u901f\u7d22\u5f15\n\n\ud83d\udca3\u5730\u96f7: \n. python 3.11\u7121\u6cd5\u7de8\u8b6f\u901a\u904e\n. python 3.10\u6e2c\u8a66\u6210\u529f\n\n\u672c\u5c08\u6848\u4f7f\u7528 [ChromaDB](https://github.com/chroma-core/chroma) \u4ee5\u53ca [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) \u4f86\u5be6\u73fe\u4e2d\u6587\u8207\u82f1\u6587\u7684\u8a9e\u610f\u641c\u5c0b\u3002\u8acb\u6309\u7167\u4ee5\u4e0b\u7684\u6b65\u9a5f\u9032\u884c\u5b89\u88dd\u548c\u4f7f\u7528\u3002\n```\n\n## \u5b89\u88dd\n1. \u5148\u78ba\u4fdd\u60a8\u5df2\u5b89\u88dd Python 3.10\u3002\n2. \u904b\u884c\u4ee5\u4e0b\u6307\u4ee4\u5b89\u88dd\u5fc5\u8981\u7684\u5957\u4ef6\uff1a\n\n```bash\npython -m pip install chromadb-semantic\n```\n\n## \u4f7f\u7528\u65b9\u6cd5\n1. \u5c07 `chromadb_semantic.py` \u6a94\u6848\u653e\u5165\u4f60\u7684\u5c08\u6848\u8cc7\u6599\u593e\u4e2d\u3002\n2. \u5728\u4f60\u7684\u5c08\u6848\u4e2d\u5f15\u5165 `chromadb_semantic.py` \u4e26\u4f7f\u7528\u4ee5\u4e0b\u51fd\u5f0f\uff1a\n\n### `create_chromadb_client()`\n\u5efa\u7acb\u4e00\u500b ChromaDB \u5ba2\u6236\u7aef\uff0c\u8cc7\u6599\u6703\u5132\u5b58\u5728 `./data` \u8cc7\u6599\u593e\u4e2d\u3002\n\n### `create_collection(client, collection_name)`\n- `client`: \u9700\u8981\u50b3\u5165\u7531 `create_chromadb_client()` \u8fd4\u56de\u7684\u5ba2\u6236\u7aef\u7269\u4ef6\u3002\n- `collection_name`: \u4f7f\u7528\u60a8\u5e0c\u671b\u5efa\u7acb\u7684\u96c6\u5408\u540d\u7a31\u3002\n\u5efa\u7acb\u4e00\u500b\u65b0\u7684\u8a9e\u610f\u641c\u5c0b\u96c6\u5408\u3002\n\n### `add_documents_to_collection(collection, documents, metadatas, ids)`\n- `collection`: \u9700\u8981\u50b3\u5165\u7531 `create_collection()` \u8fd4\u56de\u7684\u96c6\u5408\u7269\u4ef6\u3002\n- `documents`: \u6587\u6a94\u5217\u8868\uff0c\u5167\u5bb9\u53ef\u4ee5\u662f\u4e2d\u82f1\u6587\uff0c\u5fc5\u9808\u8207 `metadatas` \u548c `ids` \u4e00\u4e00\u5c0d\u61c9\u3002\n- `metadatas`: \u5143\u6578\u64da\u5217\u8868\u3002\n- `ids`: \u6587\u6a94\u6a19\u8b58\u5217\u8868\u3002\n\u5411\u96c6\u5408\u4e2d\u6dfb\u52a0\u6587\u6a94\u3002\n\n### `query_collection(collection, query_text, n_results)`\n- `collection`: \u9700\u8981\u50b3\u5165\u7531 `create_collection()` \u8fd4\u56de\u7684\u96c6\u5408\u7269\u4ef6\u3002\n- `query_text`: \u67e5\u8a62\u8a9e\u53e5\u3002\n- `n_results`: \u5f9e\u6b64\u67e5\u8a62\u4e2d\u8981\u8fd4\u56de\u7684\u7d50\u679c\u6578\u3002\n\u901a\u904e\u67e5\u8a62\u8a9e\u53e5\uff0c\u5728\u96c6\u5408\u4e2d\u57f7\u884c\u8a9e\u610f\u641c\u7d22\u3002\n\n## \u793a\u4f8b\n\u4ee5\u4e0b\u662f\u4e00\u500b\u4f7f\u7528 `chromadb_semantic.py` \u7684\u7bc4\u4f8b\uff1a\n\n```python\nfrom chromadb_semantic import *\n\nclient = create_chromadb_client()\n\ncollection_name = \"my_sentence_transformer_collection\"\ncollection = create_collection(client, collection_name)\n\ndocuments = [\"This is a sample document.\", \"\u7389\u91cc\u93ae\u7684\u6708\u4eae\u6bd4\u8f03\u5927\u9846\", \"This is a test document.\"]\nids = [\"doc1\", \"doc2\", \"doc3\"]\nmetadatas = [{\"type\": \"sample\"}, {\"type\": \"example\"}, {\"type\": \"test\"}]\n\nadd_documents_to_collection(collection, documents, metadatas, ids)\n\nquery_text = \"\u54ea\u4e00\u500b\u9109\u93ae\u7684\u6708\u4eae\u6bd4\u8f03\u5713\"\nresults = query_collection(collection, query_text, n_results=2)\n\nprint(f\"Results for query: {query_text}\")\nprint(results)\n```\n\n\u8f38\u51fa\u793a\u4f8b\uff1a\n```\nResults for query: \u54ea\u4e00\u500b\u9109\u93ae\u7684\u6708\u4eae\u6bd4\u8f03\u5713\n{'ids': [['doc2', 'doc1']], 'embeddings': None, 'documents': [['\u7389\u91cc\u93ae\u7684\u6708\u4eae\u6bd4\u8f03\u5927\u9846', 'This is a sample docummetadatas': [[{'type': 'example'}, {'type': 'sample'}]], 'distances': [[5.628111839294434, 43.863502502441406]]}\n```\n\n\u6ce8\u610f\uff1a\u7b2c\u4e00\u6b21\u5275\u5efa\u96c6\u5408\u548c\u6dfb\u52a0\u6587\u6a94\u6642\uff0c\u5d4c\u5165\u904b\u7b97\u53ef\u80fd\u9700\u8981\u4e00\u4e9b\u6642\u9593\u3002\u4e4b\u5f8c\u7684\u67e5\u8a62\u5c07\u6703\u66f4\u5feb\u3002\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "",
    "version": "1.0.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "550a01395fc8f600ecaf6a44cf52489d87f6b7c3282a9c351243967b520328e9",
                "md5": "f48a958d6a4d089087a494844895762c",
                "sha256": "f8709a89b6dac044bf5b1710b0e8f2f0fe3b4e49623eeae211a7d5b936db0c42"
            },
            "downloads": -1,
            "filename": "chromadb_semantic-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f48a958d6a4d089087a494844895762c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 3753,
            "upload_time": "2023-06-03T16:51:15",
            "upload_time_iso_8601": "2023-06-03T16:51:15.073502Z",
            "url": "https://files.pythonhosted.org/packages/55/0a/01395fc8f600ecaf6a44cf52489d87f6b7c3282a9c351243967b520328e9/chromadb_semantic-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae1731be8543f9be6bd246463a8d412efd79a518682e9313ad20414d84d8f733",
                "md5": "9a4bd9228a7cd2fa422daee34f62fead",
                "sha256": "d9fd84cabc0087bff496d6882ba8dd47d379137fe5afe41c2a255286c18a9a37"
            },
            "downloads": -1,
            "filename": "chromadb-semantic-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9a4bd9228a7cd2fa422daee34f62fead",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3218,
            "upload_time": "2023-06-03T16:51:17",
            "upload_time_iso_8601": "2023-06-03T16:51:17.811485Z",
            "url": "https://files.pythonhosted.org/packages/ae/17/31be8543f9be6bd246463a8d412efd79a518682e9313ad20414d84d8f733/chromadb-semantic-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-03 16:51:17",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "chromadb-semantic"
}
        
Elapsed time: 0.07673s