imbed


Nameimbed JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/thorwhalen/imbed
SummaryTools to work with embeddings
upload_time2024-08-29 14:45:35
maintainerNone
docs_urlNone
authorThor Whalen
requires_pythonNone
licensemit
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# imbed

Tools to work with embeddings, easily an flexibily.

Note: Work in progress...

To install:	```pip install imbed```

As we all know, though RAG (Retrieval Augumented Generation) is hyper-popular at the moment, the R part, though around for decades 
(mainly under the names "information retrieval" (IR), "search", "indexing",...), has a lot to contribute towards the success, or failure, of the effort.
The [many characteristics of the retrieval part](https://arxiv.org/abs/2312.10997) need to be tuned to align with the final generation and business objectives. 
There's still a lot of science to do. 

So the last thing we want is to be slowed down by pedestrian aspects of the process. 
We want to be agile in getting data prepared and analyzed, so we spend more time doign science, and iterate our models quickly.

There are two major aspects the `imbed` wishes to contribute two that.
* search: getting from raw data to an iterface where we can search the information effectively
* visualize: exploring the data visually (which requires yet another kind of embedding, to 2D or 3D vectors)

What we're looking for here is a setup where with minimal **configuration** (not code), we can make pipelines where we can point to the original data, enter a few parameters, 
wait, and get a "search controller" (that is, an object that has all the methods we need to do retrieval stuff). Here's an example of the kind of interface we'd like to target.

```python
raw_docs = mk_text_store(doc_src_uri)  # the store used will depend on the source and format of where the docs are stored
segments = mk_segments_store(raw_docs, ...)  # will not copy any data over, but will give a key-value view of chunked (split) docs
search_ctrl = mk_search_controller(vectorDB, embedder, ...)
search_ctrl.fit(segments, doc_src_uri, ...)
search_ctrl.save(...)
```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thorwhalen/imbed",
    "name": "imbed",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Thor Whalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/aa/ce/12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f/imbed-0.0.3.tar.gz",
    "platform": "any",
    "description": "\n# imbed\n\nTools to work with embeddings, easily an flexibily.\n\nNote: Work in progress...\n\nTo install:\t```pip install imbed```\n\nAs we all know, though RAG (Retrieval Augumented Generation) is hyper-popular at the moment, the R part, though around for decades \n(mainly under the names \"information retrieval\" (IR), \"search\", \"indexing\",...), has a lot to contribute towards the success, or failure, of the effort.\nThe [many characteristics of the retrieval part](https://arxiv.org/abs/2312.10997) need to be tuned to align with the final generation and business objectives. \nThere's still a lot of science to do. \n\nSo the last thing we want is to be slowed down by pedestrian aspects of the process. \nWe want to be agile in getting data prepared and analyzed, so we spend more time doign science, and iterate our models quickly.\n\nThere are two major aspects the `imbed` wishes to contribute two that.\n* search: getting from raw data to an iterface where we can search the information effectively\n* visualize: exploring the data visually (which requires yet another kind of embedding, to 2D or 3D vectors)\n\nWhat we're looking for here is a setup where with minimal **configuration** (not code), we can make pipelines where we can point to the original data, enter a few parameters, \nwait, and get a \"search controller\" (that is, an object that has all the methods we need to do retrieval stuff). Here's an example of the kind of interface we'd like to target.\n\n```python\nraw_docs = mk_text_store(doc_src_uri)  # the store used will depend on the source and format of where the docs are stored\nsegments = mk_segments_store(raw_docs, ...)  # will not copy any data over, but will give a key-value view of chunked (split) docs\nsearch_ctrl = mk_search_controller(vectorDB, embedder, ...)\nsearch_ctrl.fit(segments, doc_src_uri, ...)\nsearch_ctrl.save(...)\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Tools to work with embeddings",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/thorwhalen/imbed"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2dde8bf7a37a29a7eb31e930d040894139e5c98a6a57f89f6055fa1d2e14d59f",
                "md5": "73df6581bc079b02694b2ebd0124b2ed",
                "sha256": "e72ca9a72ea122f15082a6388954339c4509580302298fb5afc53f5329b345eb"
            },
            "downloads": -1,
            "filename": "imbed-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "73df6581bc079b02694b2ebd0124b2ed",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 20924,
            "upload_time": "2024-08-29T14:45:33",
            "upload_time_iso_8601": "2024-08-29T14:45:33.910031Z",
            "url": "https://files.pythonhosted.org/packages/2d/de/8bf7a37a29a7eb31e930d040894139e5c98a6a57f89f6055fa1d2e14d59f/imbed-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aace12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f",
                "md5": "d8a3b0f42c1cb0715bee956741ed5aef",
                "sha256": "41edb015afe374f9d9966a067e6cc29d4d70e7f945edbff67c17fc1fcca9098b"
            },
            "downloads": -1,
            "filename": "imbed-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d8a3b0f42c1cb0715bee956741ed5aef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20305,
            "upload_time": "2024-08-29T14:45:35",
            "upload_time_iso_8601": "2024-08-29T14:45:35.280782Z",
            "url": "https://files.pythonhosted.org/packages/aa/ce/12a90f65dc940d13175c776a8de07713719888fd40e825ab3f53a293569f/imbed-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-29 14:45:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thorwhalen",
    "github_project": "imbed",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "imbed"
}
        
Elapsed time: 0.32866s