cuboxgpt


Namecuboxgpt JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/glazec/cuboxGPT
SummaryUse GPT to chat/search your large Cubox datasets
upload_time2023-05-07 17:24:26
maintainer
docs_urlNone
authorGlaze
requires_python
licenseMIT
keywords cubox search ai gpt langchain
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cuboxGPT

Use GPT to help users quickly search/chat with your large cubox dataset.

# Use

Install the package.

```bash
pip install cuboxGPT
```

Export the cubox dataset as html file.
![export](./media/cubox_export.png)

Call the command line tool

```bash
# set openai api key
EXPORT OPENAI_API_KEY=<your openai api key>

# import all cubox bookmarks and downald all web contents.
# Note that the cli will output links that are failed to download and links that have not enough contents.
cuboxgpt  import-data <cubox_export.html file location>

# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.
cuboxgpt init-database

# chat/seach with the dataset
cuboxgpt search <query>
```

# Development

```bash
venv ./venv
source ./venv/bin/activate
pip install --editable .
```

`cuboxGPT.py` has all comand line tools implementation.

`chatFromDB.py` reads from the database and implement the query function.

`webPraser.py` takes responsibility to parse the html file and download the web contents.

`db.py` generate embeddings and save web contents to the database.

`pyproject.toml` contains ruff lint configuration.

# Roadmap

Goal: Enhance the search experience and easily keep datasets up to date.

- [ ] Better CRUD on database. Users can update/delete single ducoments in the database.
- [ ] Seach document with custom filter on metadata.
- [ ] Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin
- [ ] Better updating experience if user input a new cubox export file.
- [ ] Pagination for search results.
- [ ] Analyze user's query to better hit keywords.
- [ ] For links failed to download, retry with Seleum
- [ ] Support multi-threading for downloading web contents.
- [ ] Better title by supporting open graph meta tags

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/glazec/cuboxGPT",
    "name": "cuboxgpt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Cubox,search,AI,GPT,langchain",
    "author": "Glaze",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/a4/a3/6c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db/cuboxgpt-0.1.1.tar.gz",
    "platform": null,
    "description": "# cuboxGPT\n\nUse GPT to help users quickly search/chat with your large cubox dataset.\n\n# Use\n\nInstall the package.\n\n```bash\npip install cuboxGPT\n```\n\nExport the cubox dataset as html file.\n![export](./media/cubox_export.png)\n\nCall the command line tool\n\n```bash\n# set openai api key\nEXPORT OPENAI_API_KEY=<your openai api key>\n\n# import all cubox bookmarks and downald all web contents.\n# Note that the cli will output links that are failed to download and links that have not enough contents.\ncuboxgpt  import-data <cubox_export.html file location>\n\n# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.\ncuboxgpt init-database\n\n# chat/seach with the dataset\ncuboxgpt search <query>\n```\n\n# Development\n\n```bash\nvenv ./venv\nsource ./venv/bin/activate\npip install --editable .\n```\n\n`cuboxGPT.py` has all comand line tools implementation.\n\n`chatFromDB.py` reads from the database and implement the query function.\n\n`webPraser.py` takes responsibility to parse the html file and download the web contents.\n\n`db.py` generate embeddings and save web contents to the database.\n\n`pyproject.toml` contains ruff lint configuration.\n\n# Roadmap\n\nGoal: Enhance the search experience and easily keep datasets up to date.\n\n- [ ] Better CRUD on database. Users can update/delete single ducoments in the database.\n- [ ] Seach document with custom filter on metadata.\n- [ ] Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin\n- [ ] Better updating experience if user input a new cubox export file.\n- [ ] Pagination for search results.\n- [ ] Analyze user's query to better hit keywords.\n- [ ] For links failed to download, retry with Seleum\n- [ ] Support multi-threading for downloading web contents.\n- [ ] Better title by supporting open graph meta tags\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Use GPT to chat/search your large Cubox datasets",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/glazec/cuboxGPT"
    },
    "split_keywords": [
        "cubox",
        "search",
        "ai",
        "gpt",
        "langchain"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cf8cbf1892f6d515bee1846bbb751ec71bd054cdd078979dc4a6c5b1f749aa1c",
                "md5": "1c0aad3e661a00ccf802ffa8e9d6ee50",
                "sha256": "a9d2380c49754c13c6ba6849e0a8203328f7aca9d574be4768b5861634cb19bf"
            },
            "downloads": -1,
            "filename": "cuboxgpt-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1c0aad3e661a00ccf802ffa8e9d6ee50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6296,
            "upload_time": "2023-05-07T17:24:22",
            "upload_time_iso_8601": "2023-05-07T17:24:22.333140Z",
            "url": "https://files.pythonhosted.org/packages/cf/8c/bf1892f6d515bee1846bbb751ec71bd054cdd078979dc4a6c5b1f749aa1c/cuboxgpt-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a4a36c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db",
                "md5": "edf5610878fabb0522470e39f604122a",
                "sha256": "c089e17a7198a6a16be116f6b5b2747a3a6660356ed4b6b20efb85cbf771c8d0"
            },
            "downloads": -1,
            "filename": "cuboxgpt-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "edf5610878fabb0522470e39f604122a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5754,
            "upload_time": "2023-05-07T17:24:26",
            "upload_time_iso_8601": "2023-05-07T17:24:26.792989Z",
            "url": "https://files.pythonhosted.org/packages/a4/a3/6c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db/cuboxgpt-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-07 17:24:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "glazec",
    "github_project": "cuboxGPT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cuboxgpt"
}
        
Elapsed time: 1.49004s