# GPT-me
An experiment to replicate a fully self-sufficient, human-like chatbot that imitates you using various artificial intelligence models.
## Idea
This is the order of steps for how I envision GPT-me functioning.
### Semantic Memories
Memories are still needed for AI self-replication. Unfortunately, large memory file summarization is a huge source of token consumption. One that poor developers like myself cannot afford.
### Document Memories (deprecated)
In order to replicate yourself using AI, memories are needed. In the case of GPT-me, memories would be provided in the form of summarized transcripts of past messages.
The file `scripts/memory.py` will be used to summarize a transcript using a version of [BART](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/bart) fine tuned on the [samsum](https://huggingface.co/datasets/samsum) dataset. The fine tuned model by [philschmid](https://huggingface.co/philschmid) is found on HuggingFace at [philschmid/distilbart-cnn-12-6-samsum](https://huggingface.co/philschmid/distilbart-cnn-12-6-samsum).
The transcript will be split into several line chunks by the preprocessor, then those chunks will be summarized by BART.
This process is recursive, and will eventually only store the key details of a person in a small summary of their personality.
The general recursive summary idea was mildly inherited from OpenAI's work on [Summarizing books with human feedback](https://openai.com/research/summarizing-books), but using a dedicated summary transformer instead of GPT-3.
### Chatting
1. GPT-me receives a message from some chat app \(many adapters should exist eventually\), this can contain text, links, images, anything.
2. The inputted message is processed and complex elements are simplified. Given links, they content will be crawled and summarized through a route similar to memories. Images should be described by some captioning model \(ex. [nlpconnect/vit-gpt2-image-captioning](https://huggingface.co/nlpconnect/vit-gpt2-image-captioning)\).
3. All provided summary content, along with memories, are passed into a complex gpt-3.5-turbo prompt, which will then carry out the conversation. This prompt will also include a sample of the user's writing style, as to try and emulate that.
4. The generated individual response is sent through the chat app adapter.
5. The generated response will be saved, as well as up to \(roughly\) 10 full exchanges of messages from either side. This maintains and thinks only about the most current information.
6. After an indeterminate period of time or number of messages, a new memories set will be automatically rebuilt and the process continues.
```mermaid
---
title: Exhaustive Single Message Flow
---
flowchart LR
Input["Adapter Input"] --> Message
Message --> Text & Images & Links
Text --> Sem["Semantic Memory"] & DuckDuckGo
DuckDuckGo --> Web["Web Text Crawler"]
Images --> Captioner & OCR
%% OCR --> Sem
Links --> Web
Sem & Captioner & Web & OCR & Text --> ChatGPT
ChatGPT --> Output["Adapter Output"]
```
## Todo
- [ ] Semantic memory question generation. \(ex. `"Peter hit me with a paper today."` -> `["Who is Peter?", "What happened today?"]`\)
- [ ] Integrate web search summaries.
- [ ] Adapter example.
- [ ] Add image captioner.
- [ ] System to modify message into proper style.
Raw data
{
"_id": null,
"home_page": null,
"name": "gpt-me",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "gpt,ai,chatgpt,emulation",
"author": null,
"author_email": "TKDKid1000 <mail@tkdkid1000.net>",
"download_url": "https://files.pythonhosted.org/packages/82/b0/d4d4b20821ae56ecc33edbfd67422982eefddf3bc8f78a952a4dedad572c/gpt_me-0.1.0.tar.gz",
"platform": null,
"description": "# GPT-me\n\nAn experiment to replicate a fully self-sufficient, human-like chatbot that imitates you using various artificial intelligence models.\n\n## Idea\n\nThis is the order of steps for how I envision GPT-me functioning.\n\n### Semantic Memories\n\nMemories are still needed for AI self-replication. Unfortunately, large memory file summarization is a huge source of token consumption. One that poor developers like myself cannot afford.\n\n### Document Memories (deprecated)\n\nIn order to replicate yourself using AI, memories are needed. In the case of GPT-me, memories would be provided in the form of summarized transcripts of past messages.\nThe file `scripts/memory.py` will be used to summarize a transcript using a version of [BART](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/bart) fine tuned on the [samsum](https://huggingface.co/datasets/samsum) dataset. The fine tuned model by [philschmid](https://huggingface.co/philschmid) is found on HuggingFace at [philschmid/distilbart-cnn-12-6-samsum](https://huggingface.co/philschmid/distilbart-cnn-12-6-samsum).\n\nThe transcript will be split into several line chunks by the preprocessor, then those chunks will be summarized by BART.\nThis process is recursive, and will eventually only store the key details of a person in a small summary of their personality.\n\nThe general recursive summary idea was mildly inherited from OpenAI's work on [Summarizing books with human feedback](https://openai.com/research/summarizing-books), but using a dedicated summary transformer instead of GPT-3.\n\n### Chatting\n\n1. GPT-me receives a message from some chat app \\(many adapters should exist eventually\\), this can contain text, links, images, anything.\n2. The inputted message is processed and complex elements are simplified. Given links, they content will be crawled and summarized through a route similar to memories. Images should be described by some captioning model \\(ex. [nlpconnect/vit-gpt2-image-captioning](https://huggingface.co/nlpconnect/vit-gpt2-image-captioning)\\).\n3. All provided summary content, along with memories, are passed into a complex gpt-3.5-turbo prompt, which will then carry out the conversation. This prompt will also include a sample of the user's writing style, as to try and emulate that.\n4. The generated individual response is sent through the chat app adapter.\n5. The generated response will be saved, as well as up to \\(roughly\\) 10 full exchanges of messages from either side. This maintains and thinks only about the most current information.\n6. After an indeterminate period of time or number of messages, a new memories set will be automatically rebuilt and the process continues.\n\n```mermaid\n---\ntitle: Exhaustive Single Message Flow\n---\nflowchart LR\n Input[\"Adapter Input\"] --> Message\n Message --> Text & Images & Links\n Text --> Sem[\"Semantic Memory\"] & DuckDuckGo\n DuckDuckGo --> Web[\"Web Text Crawler\"]\n Images --> Captioner & OCR\n %% OCR --> Sem\n Links --> Web\n Sem & Captioner & Web & OCR & Text --> ChatGPT\n ChatGPT --> Output[\"Adapter Output\"]\n```\n\n## Todo\n- [ ] Semantic memory question generation. \\(ex. `\"Peter hit me with a paper today.\"` -> `[\"Who is Peter?\", \"What happened today?\"]`\\)\n- [ ] Integrate web search summaries.\n- [ ] Adapter example.\n- [ ] Add image captioner.\n- [ ] System to modify message into proper style.\n",
"bugtrack_url": null,
"license": null,
"summary": "An experiment to replicate a fully self-sufficient, human-like chatbot that imitates you using various artificial intelligence models.",
"version": "0.1.0",
"project_urls": {
"Home": "https://github.com/TKDKid1000/gpt-me"
},
"split_keywords": [
"gpt",
"ai",
"chatgpt",
"emulation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1dbc1781e9ab8bfa5b1751e5fac6959819c3a4743a50c4f60c97bffc751c8889",
"md5": "69cc8f26957272f9262fdb29578f946b",
"sha256": "768ceee57b2ad27006756b5342ad935a62635f70b59d8f966732d11d29875389"
},
"downloads": -1,
"filename": "gpt_me-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "69cc8f26957272f9262fdb29578f946b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 12172,
"upload_time": "2023-05-27T14:48:13",
"upload_time_iso_8601": "2023-05-27T14:48:13.612429Z",
"url": "https://files.pythonhosted.org/packages/1d/bc/1781e9ab8bfa5b1751e5fac6959819c3a4743a50c4f60c97bffc751c8889/gpt_me-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "82b0d4d4b20821ae56ecc33edbfd67422982eefddf3bc8f78a952a4dedad572c",
"md5": "bfe3f22a06de5e5ce996d00e1d5f72b0",
"sha256": "07855a7842868fc8e963564198d94c87c188a50217f4674376ea49253e582221"
},
"downloads": -1,
"filename": "gpt_me-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "bfe3f22a06de5e5ce996d00e1d5f72b0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14319,
"upload_time": "2023-05-27T14:48:31",
"upload_time_iso_8601": "2023-05-27T14:48:31.116964Z",
"url": "https://files.pythonhosted.org/packages/82/b0/d4d4b20821ae56ecc33edbfd67422982eefddf3bc8f78a952a4dedad572c/gpt_me-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-27 14:48:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "TKDKid1000",
"github_project": "gpt-me",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"23.3.0"
]
]
},
{
"name": "discord.py-self",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "gptrim",
"specs": [
[
"==",
"0.1.6"
]
]
},
{
"name": "openai",
"specs": [
[
"==",
"0.27.2"
]
]
},
{
"name": "Pillow",
"specs": [
[
"==",
"9.0.1"
]
]
},
{
"name": "pytesseract",
"specs": [
[
"==",
"0.3.10"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.31.0"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
"==",
"2.2.2"
]
]
},
{
"name": "streamlit",
"specs": [
[
"==",
"1.22.0"
]
]
},
{
"name": "tenacity",
"specs": [
[
"==",
"8.2.2"
]
]
},
{
"name": "tiktoken",
"specs": [
[
"==",
"0.3.1"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.65.0"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.29.2"
]
]
}
],
"lcname": "gpt-me"
}