data2vec


Namedata2vec JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/xuehangcang/data2vec
Summarydata2Vec是一个Python工具,用于数据向量表征。
upload_time2023-04-12 15:16:06
maintainer
docs_urlNone
authorXuehang Cang
requires_python
licenseMIT License
keywords vector pytorch vector database
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # data2vec

data2Vec是一个Python工具,用于数据向量表征。

它可以将任何形式的数据转换为向量矩阵,并能够将这些向量存储在向量数据库中以供后续使用。

使用data2Vec,您可以轻松地处理各种数据类型,如文本、图像、音频等,并将其转换为向量表示形式,以便于特征提取、相似度计算、聚类等任务

## 安装

```
pip install data2Vec
```

## 使用

```python
from data2vec import Img2Vec

# img2vec = Img2Vec()
# 向量数据 单个
# vec = img2vec.get_vec('../data/cat.0.jpg')
# print(vec)

# 向量数据 多个
# vec = img2vec.get_list_vec('../data')
# print(vec)

```

## 支持模型

| Model name | Return vector length |
|------------|----------------------|
| Alexnet    | 1000                 |
| Resnet-18  | 1000                 |
| Resnet-34  | 1000                 |
| Resnet-50  | 1000                 |
| Resnet-101 | 1000                 |
| Resnet-152 | 1000                 |


## 例子

### 图像存储到[`pinecone`](https://www.pinecone.io/)向量数据库



```python
import pinecone
from data2vec import Img2Vec

# https://www.pinecone.io/

pinecone.init(api_key="xxx", environment="xxx")
img2vec = Img2Vec()
# 存储向量数据 单个
# vec = img2vec.get_vec('../data/cat.0.jpg')
# index = pinecone.Index("xxx")
# index.upsert(vec)
# fetch_response = index.fetch(ids=["cat.0.jpg"])
# print(fetch_response)

# 存储向量数据 多个
# vec = img2vec.get_list_vec('../data')
# index = pinecone.Index("xxx")
# index.upsert(vec)
# fetch_response = index.fetch(ids=["cat.0.jpg"])
# print(fetch_response)

# 相似度查询
index = pinecone.Index("xxx")
vec = img2vec.get_vec('../data/cat.0.jpg')
r = index.query(
    vector=vec[0][1],
    top_k=5,
)
print(r)
"""
Using cuda device
resnet18 model loaded
{'matches': [{'id': 'cat.0.jpg', 'score': 1.0, 'values': []},
             {'id': 'cat.7.jpg', 'score': 0.70519489, 'values': []},
             {'id': 'cat.20.jpg', 'score': 0.696186125, 'values': []},
             {'id': 'cat.14.jpg', 'score': 0.691424072, 'values': []},
             {'id': 'cat.8.jpg', 'score': 0.686835527, 'values': []}],
 'namespace': ''}
 
"""

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xuehangcang/data2vec",
    "name": "data2vec",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Vector Pytorch Vector Database",
    "author": "Xuehang Cang",
    "author_email": "xuehangcang@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/1c/5c/40ae46b13c8bef0c32d37d1268f0417a8a75d959b3dd496a098f2887bb55/data2vec-0.0.3.tar.gz",
    "platform": "any",
    "description": "# data2vec\r\n\r\ndata2Vec\u662f\u4e00\u4e2aPython\u5de5\u5177\uff0c\u7528\u4e8e\u6570\u636e\u5411\u91cf\u8868\u5f81\u3002\r\n\r\n\u5b83\u53ef\u4ee5\u5c06\u4efb\u4f55\u5f62\u5f0f\u7684\u6570\u636e\u8f6c\u6362\u4e3a\u5411\u91cf\u77e9\u9635\uff0c\u5e76\u80fd\u591f\u5c06\u8fd9\u4e9b\u5411\u91cf\u5b58\u50a8\u5728\u5411\u91cf\u6570\u636e\u5e93\u4e2d\u4ee5\u4f9b\u540e\u7eed\u4f7f\u7528\u3002\r\n\r\n\u4f7f\u7528data2Vec\uff0c\u60a8\u53ef\u4ee5\u8f7b\u677e\u5730\u5904\u7406\u5404\u79cd\u6570\u636e\u7c7b\u578b\uff0c\u5982\u6587\u672c\u3001\u56fe\u50cf\u3001\u97f3\u9891\u7b49\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u5411\u91cf\u8868\u793a\u5f62\u5f0f\uff0c\u4ee5\u4fbf\u4e8e\u7279\u5f81\u63d0\u53d6\u3001\u76f8\u4f3c\u5ea6\u8ba1\u7b97\u3001\u805a\u7c7b\u7b49\u4efb\u52a1\r\n\r\n## \u5b89\u88c5\r\n\r\n```\r\npip install data2Vec\r\n```\r\n\r\n## \u4f7f\u7528\r\n\r\n```python\r\nfrom data2vec import Img2Vec\r\n\r\n# img2vec = Img2Vec()\r\n# \u5411\u91cf\u6570\u636e \u5355\u4e2a\r\n# vec = img2vec.get_vec('../data/cat.0.jpg')\r\n# print(vec)\r\n\r\n# \u5411\u91cf\u6570\u636e \u591a\u4e2a\r\n# vec = img2vec.get_list_vec('../data')\r\n# print(vec)\r\n\r\n```\r\n\r\n## \u652f\u6301\u6a21\u578b\r\n\r\n| Model name | Return vector length |\r\n|------------|----------------------|\r\n| Alexnet    | 1000                 |\r\n| Resnet-18  | 1000                 |\r\n| Resnet-34  | 1000                 |\r\n| Resnet-50  | 1000                 |\r\n| Resnet-101 | 1000                 |\r\n| Resnet-152 | 1000                 |\r\n\r\n\r\n## \u4f8b\u5b50\r\n\r\n### \u56fe\u50cf\u5b58\u50a8\u5230[`pinecone`](https://www.pinecone.io/)\u5411\u91cf\u6570\u636e\u5e93\r\n\r\n\r\n\r\n```python\r\nimport pinecone\r\nfrom data2vec import Img2Vec\r\n\r\n# https://www.pinecone.io/\r\n\r\npinecone.init(api_key=\"xxx\", environment=\"xxx\")\r\nimg2vec = Img2Vec()\r\n# \u5b58\u50a8\u5411\u91cf\u6570\u636e \u5355\u4e2a\r\n# vec = img2vec.get_vec('../data/cat.0.jpg')\r\n# index = pinecone.Index(\"xxx\")\r\n# index.upsert(vec)\r\n# fetch_response = index.fetch(ids=[\"cat.0.jpg\"])\r\n# print(fetch_response)\r\n\r\n# \u5b58\u50a8\u5411\u91cf\u6570\u636e \u591a\u4e2a\r\n# vec = img2vec.get_list_vec('../data')\r\n# index = pinecone.Index(\"xxx\")\r\n# index.upsert(vec)\r\n# fetch_response = index.fetch(ids=[\"cat.0.jpg\"])\r\n# print(fetch_response)\r\n\r\n# \u76f8\u4f3c\u5ea6\u67e5\u8be2\r\nindex = pinecone.Index(\"xxx\")\r\nvec = img2vec.get_vec('../data/cat.0.jpg')\r\nr = index.query(\r\n    vector=vec[0][1],\r\n    top_k=5,\r\n)\r\nprint(r)\r\n\"\"\"\r\nUsing cuda device\r\nresnet18 model loaded\r\n{'matches': [{'id': 'cat.0.jpg', 'score': 1.0, 'values': []},\r\n             {'id': 'cat.7.jpg', 'score': 0.70519489, 'values': []},\r\n             {'id': 'cat.20.jpg', 'score': 0.696186125, 'values': []},\r\n             {'id': 'cat.14.jpg', 'score': 0.691424072, 'values': []},\r\n             {'id': 'cat.8.jpg', 'score': 0.686835527, 'values': []}],\r\n 'namespace': ''}\r\n \r\n\"\"\"\r\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "data2Vec\u662f\u4e00\u4e2aPython\u5de5\u5177\uff0c\u7528\u4e8e\u6570\u636e\u5411\u91cf\u8868\u5f81\u3002",
    "version": "0.0.3",
    "split_keywords": [
        "vector",
        "pytorch",
        "vector",
        "database"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1c5c40ae46b13c8bef0c32d37d1268f0417a8a75d959b3dd496a098f2887bb55",
                "md5": "93947d8f618345c74a06bd26c872ee0b",
                "sha256": "17640a9ba8dcd05f0deec945e9bd41389eb4d9d0429fa191a4dec0a06e404ffd"
            },
            "downloads": -1,
            "filename": "data2vec-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "93947d8f618345c74a06bd26c872ee0b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4211,
            "upload_time": "2023-04-12T15:16:06",
            "upload_time_iso_8601": "2023-04-12T15:16:06.685729Z",
            "url": "https://files.pythonhosted.org/packages/1c/5c/40ae46b13c8bef0c32d37d1268f0417a8a75d959b3dd496a098f2887bb55/data2vec-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-12 15:16:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "xuehangcang",
    "github_project": "data2vec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "data2vec"
}
        
Elapsed time: 0.06549s