# imbed
Tools to work with embeddings, easily an flexibily.
Note: Work in progress...
To install: ```pip install imbed```
As we all know, though RAG (Retrieval Augumented Generation) is hyper-popular at the moment, the R part, though around for decades
(mainly under the names "information retrieval" (IR), "search", "indexing",...), has a lot to contribute towards the success, or failure, of the effort.
The [many characteristics of the retrieval part](https://arxiv.org/abs/2312.10997) need to be tuned to align with the final generation and business objectives.
There's still a lot of science to do.
So the last thing we want is to be slowed down by pedestrian aspects of the process.
We want to be agile in getting data prepared and analyzed, so we spend more time doign science, and iterate our models quickly.
There are two major aspects the `imbed` wishes to contribute two that.
* search: getting from raw data to an iterface where we can search the information effectively
* visualize: exploring the data visually (which requires yet another kind of embedding, to 2D or 3D vectors)
What we're looking for here is a setup where with minimal **configuration** (not code), we can make pipelines where we can point to the original data, enter a few parameters,
wait, and get a "search controller" (that is, an object that has all the methods we need to do retrieval stuff). Here's an example of the kind of interface we'd like to target.
```python
raw_docs = mk_text_store(doc_src_uri) # the store used will depend on the source and format of where the docs are stored
segments = mk_segments_store(raw_docs, ...) # will not copy any data over, but will give a key-value view of chunked (split) docs
search_ctrl = mk_search_controller(vectorDB, embedder, ...)
search_ctrl.fit(segments, doc_src_uri, ...)
search_ctrl.save(...)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/thorwhalen/imbed",
"name": "imbed",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Thor Whalen",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/aa/ce/12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f/imbed-0.0.3.tar.gz",
"platform": "any",
"description": "\n# imbed\n\nTools to work with embeddings, easily an flexibily.\n\nNote: Work in progress...\n\nTo install:\t```pip install imbed```\n\nAs we all know, though RAG (Retrieval Augumented Generation) is hyper-popular at the moment, the R part, though around for decades \n(mainly under the names \"information retrieval\" (IR), \"search\", \"indexing\",...), has a lot to contribute towards the success, or failure, of the effort.\nThe [many characteristics of the retrieval part](https://arxiv.org/abs/2312.10997) need to be tuned to align with the final generation and business objectives. \nThere's still a lot of science to do. \n\nSo the last thing we want is to be slowed down by pedestrian aspects of the process. \nWe want to be agile in getting data prepared and analyzed, so we spend more time doign science, and iterate our models quickly.\n\nThere are two major aspects the `imbed` wishes to contribute two that.\n* search: getting from raw data to an iterface where we can search the information effectively\n* visualize: exploring the data visually (which requires yet another kind of embedding, to 2D or 3D vectors)\n\nWhat we're looking for here is a setup where with minimal **configuration** (not code), we can make pipelines where we can point to the original data, enter a few parameters, \nwait, and get a \"search controller\" (that is, an object that has all the methods we need to do retrieval stuff). Here's an example of the kind of interface we'd like to target.\n\n```python\nraw_docs = mk_text_store(doc_src_uri) # the store used will depend on the source and format of where the docs are stored\nsegments = mk_segments_store(raw_docs, ...) # will not copy any data over, but will give a key-value view of chunked (split) docs\nsearch_ctrl = mk_search_controller(vectorDB, embedder, ...)\nsearch_ctrl.fit(segments, doc_src_uri, ...)\nsearch_ctrl.save(...)\n```\n\n\n\n",
"bugtrack_url": null,
"license": "mit",
"summary": "Tools to work with embeddings",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/thorwhalen/imbed"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2dde8bf7a37a29a7eb31e930d040894139e5c98a6a57f89f6055fa1d2e14d59f",
"md5": "73df6581bc079b02694b2ebd0124b2ed",
"sha256": "e72ca9a72ea122f15082a6388954339c4509580302298fb5afc53f5329b345eb"
},
"downloads": -1,
"filename": "imbed-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "73df6581bc079b02694b2ebd0124b2ed",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 20924,
"upload_time": "2024-08-29T14:45:33",
"upload_time_iso_8601": "2024-08-29T14:45:33.910031Z",
"url": "https://files.pythonhosted.org/packages/2d/de/8bf7a37a29a7eb31e930d040894139e5c98a6a57f89f6055fa1d2e14d59f/imbed-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "aace12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f",
"md5": "d8a3b0f42c1cb0715bee956741ed5aef",
"sha256": "41edb015afe374f9d9966a067e6cc29d4d70e7f945edbff67c17fc1fcca9098b"
},
"downloads": -1,
"filename": "imbed-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "d8a3b0f42c1cb0715bee956741ed5aef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 20305,
"upload_time": "2024-08-29T14:45:35",
"upload_time_iso_8601": "2024-08-29T14:45:35.280782Z",
"url": "https://files.pythonhosted.org/packages/aa/ce/12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f/imbed-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-29 14:45:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thorwhalen",
"github_project": "imbed",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "imbed"
}