# cuboxGPT
Use GPT to help users quickly search/chat with your large cubox dataset.
# Use
Install the package.
```bash
pip install cuboxGPT
```
Export the cubox dataset as html file.
![export](./media/cubox_export.png)
Call the command line tool
```bash
# set openai api key
EXPORT OPENAI_API_KEY=<your openai api key>
# import all cubox bookmarks and downald all web contents.
# Note that the cli will output links that are failed to download and links that have not enough contents.
cuboxgpt import-data <cubox_export.html file location>
# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.
cuboxgpt init-database
# chat/seach with the dataset
cuboxgpt search <query>
```
# Development
```bash
venv ./venv
source ./venv/bin/activate
pip install --editable .
```
`cuboxGPT.py` has all comand line tools implementation.
`chatFromDB.py` reads from the database and implement the query function.
`webPraser.py` takes responsibility to parse the html file and download the web contents.
`db.py` generate embeddings and save web contents to the database.
`pyproject.toml` contains ruff lint configuration.
# Roadmap
Goal: Enhance the search experience and easily keep datasets up to date.
- [ ] Better CRUD on database. Users can update/delete single ducoments in the database.
- [ ] Seach document with custom filter on metadata.
- [ ] Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin
- [ ] Better updating experience if user input a new cubox export file.
- [ ] Pagination for search results.
- [ ] Analyze user's query to better hit keywords.
- [ ] For links failed to download, retry with Seleum
- [ ] Support multi-threading for downloading web contents.
- [ ] Better title by supporting open graph meta tags
Raw data
{
"_id": null,
"home_page": "https://github.com/glazec/cuboxGPT",
"name": "cuboxgpt",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Cubox,search,AI,GPT,langchain",
"author": "Glaze",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/a4/a3/6c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db/cuboxgpt-0.1.1.tar.gz",
"platform": null,
"description": "# cuboxGPT\n\nUse GPT to help users quickly search/chat with your large cubox dataset.\n\n# Use\n\nInstall the package.\n\n```bash\npip install cuboxGPT\n```\n\nExport the cubox dataset as html file.\n![export](./media/cubox_export.png)\n\nCall the command line tool\n\n```bash\n# set openai api key\nEXPORT OPENAI_API_KEY=<your openai api key>\n\n# import all cubox bookmarks and downald all web contents.\n# Note that the cli will output links that are failed to download and links that have not enough contents.\ncuboxgpt import-data <cubox_export.html file location>\n\n# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.\ncuboxgpt init-database\n\n# chat/seach with the dataset\ncuboxgpt search <query>\n```\n\n# Development\n\n```bash\nvenv ./venv\nsource ./venv/bin/activate\npip install --editable .\n```\n\n`cuboxGPT.py` has all comand line tools implementation.\n\n`chatFromDB.py` reads from the database and implement the query function.\n\n`webPraser.py` takes responsibility to parse the html file and download the web contents.\n\n`db.py` generate embeddings and save web contents to the database.\n\n`pyproject.toml` contains ruff lint configuration.\n\n# Roadmap\n\nGoal: Enhance the search experience and easily keep datasets up to date.\n\n- [ ] Better CRUD on database. Users can update/delete single ducoments in the database.\n- [ ] Seach document with custom filter on metadata.\n- [ ] Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin\n- [ ] Better updating experience if user input a new cubox export file.\n- [ ] Pagination for search results.\n- [ ] Analyze user's query to better hit keywords.\n- [ ] For links failed to download, retry with Seleum\n- [ ] Support multi-threading for downloading web contents.\n- [ ] Better title by supporting open graph meta tags\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Use GPT to chat/search your large Cubox datasets",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/glazec/cuboxGPT"
},
"split_keywords": [
"cubox",
"search",
"ai",
"gpt",
"langchain"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cf8cbf1892f6d515bee1846bbb751ec71bd054cdd078979dc4a6c5b1f749aa1c",
"md5": "1c0aad3e661a00ccf802ffa8e9d6ee50",
"sha256": "a9d2380c49754c13c6ba6849e0a8203328f7aca9d574be4768b5861634cb19bf"
},
"downloads": -1,
"filename": "cuboxgpt-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1c0aad3e661a00ccf802ffa8e9d6ee50",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6296,
"upload_time": "2023-05-07T17:24:22",
"upload_time_iso_8601": "2023-05-07T17:24:22.333140Z",
"url": "https://files.pythonhosted.org/packages/cf/8c/bf1892f6d515bee1846bbb751ec71bd054cdd078979dc4a6c5b1f749aa1c/cuboxgpt-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a4a36c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db",
"md5": "edf5610878fabb0522470e39f604122a",
"sha256": "c089e17a7198a6a16be116f6b5b2747a3a6660356ed4b6b20efb85cbf771c8d0"
},
"downloads": -1,
"filename": "cuboxgpt-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "edf5610878fabb0522470e39f604122a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5754,
"upload_time": "2023-05-07T17:24:26",
"upload_time_iso_8601": "2023-05-07T17:24:26.792989Z",
"url": "https://files.pythonhosted.org/packages/a4/a3/6c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db/cuboxgpt-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-07 17:24:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "glazec",
"github_project": "cuboxGPT",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "cuboxgpt"
}