| Name | rebrain JSON |
| Version |
0.1.7
JSON |
| download |
| home_page | None |
| Summary | Transform chat history into structured, navigable memory graphs |
| upload_time | 2025-10-19 03:41:05 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | MIT |
| keywords |
memory
ai
graph
vector-db
persona
cognition
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# 🧠 Rebrain
**Transform chat history into structured, personalized AI memory.**
Rebrain processes your ChatGPT conversations through a 5-step pipeline, extracting observations, synthesizing learnings and cognitions, then building a user persona for hyper-personalized AI interactions.
---
## 🚀 Quick Start (Recommended)
**Using UV - Zero Setup Required**
```bash
# 1. Install UV (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Set your API key
export GEMINI_API_KEY=your_key_here
# 3. Process your conversations (place conversations.json in current directory)
uvx rebrain pipeline run
# 4. Start MCP server (auto-loads processed data)
uvx rebrain mcp --port 9999
# Advanced: Custom paths and config
uvx rebrain pipeline run --input /path/to/conversations.json --data-path ./my-data
uvx rebrain pipeline run --config custom.yaml # Custom clustering parameters
uvx rebrain mcp --data-path ./my-data --port 9999 --user-id myproject
```
**That's it!** No Python installation, no virtual environments, no dependencies to manage.
See [INSTALL.md](INSTALL.md) for detailed installation options.
---
## 🎯 For Developers
```bash
# Clone and setup
git clone https://github.com/yasinsb/rebrain.git
cd rebrain
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp env.template .env # Add your GEMINI_API_KEY
# Run pipeline (bash CLI)
scripts/pipeline/cli.sh all
# Or step by step
scripts/pipeline/cli.sh step1 # Transform & filter
scripts/pipeline/cli.sh step2 # Extract & cluster observations
scripts/pipeline/cli.sh step3 # Synthesize learnings
scripts/pipeline/cli.sh step4 # Synthesize cognitions
scripts/pipeline/cli.sh step5 # Build persona
# Load into memg-core
python scripts/load_memg.py
```
**Output:** `data/persona/persona.md` - ready for system prompts!
---
## 🤖 MCP Integration (Claude Desktop / Cursor)
### HTTP Mode (Recommended for Stability)
**Start the server:**
```bash
# Start with default user_id="rebrain"
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999
# Or use custom user_id for multi-user setups
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999 --user-id myproject
# Using rebrain CLI (equivalent)
uvx rebrain mcp --data-path ./data --port 9999
```
**Add to `~/.cursor/mcp.json` or Claude Desktop config:**
```json
{
"mcpServers": {
"rebrain": {
"url": "http://localhost:9999/mcp"
}
}
}
```
### Direct Mode (stdio)
> ⚠️ **Known Issue:** stdio mode has stability issues with Cursor/Claude Desktop. HTTP mode is recommended.
If you still want to try stdio mode:
```json
{
"mcpServers": {
"rebrain": {
"command": "uvx",
"args": ["--from", "rebrain", "rebrain-mcp", "--data-path", "/absolute/path/to/your/data"],
"env": {
"GEMINI_API_KEY": "your_key_here"
}
}
}
}
```
### User ID Configuration
The MCP server now supports configurable `user_id` for memory isolation:
- **Default:** `user_id="rebrain"` - used if not specified
- **Custom:** Pass `--user-id myproject` when starting the server
- **Multi-user:** Each user_id maintains separate memory space
- **Agent Usage:** Agents don't need to provide user_id (uses server default)
**Benefits:**
- 💰 **Process once (~$0.10-0.20), query forever for free** (local memg-core)
- ⚡ **Instant restarts** - database persists, no reprocessing
- 🔒 **100% local** - no ongoing API costs, no cloud lock-in
---
## Quick Start (Legacy)
### 1. Setup
```bash
git clone <repo-url>
cd rebrain
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure
cp env.template .env
# Edit .env: add GEMINI_API_KEY
```
### 2. Prepare Data
Export your ChatGPT conversations and place the JSON file at:
```
data/raw/conversations.json
```
### 3. Run Pipeline
```bash
# Full pipeline (5 steps)
scripts/pipeline/cli.sh all
# Or run individual steps
scripts/pipeline/cli.sh step1
scripts/pipeline/cli.sh step2
# ... etc
```
### 4. Check Results
```bash
# View pipeline status
scripts/pipeline/cli.sh status
# Read your persona
cat data/persona/persona.md
```
---
## Pipeline Overview
Rebrain uses a 5-stage synthesis pipeline:
```
Raw Conversations (JSON export)
↓ Step 1: Transform, Filter & Truncate
Clean Conversations (date filtered, code removed, smart truncation)
↓ Step 2: Extract & Cluster Observations (AI + K-Means)
Clustered Observations (~40 clusters by category)
↓ Step 3: Synthesize Learnings (AI + K-Means)
Clustered Learnings (~10 clusters)
↓ Step 4: Synthesize Cognitions (AI)
High-Level Cognitions (~20 patterns)
↓ Step 5: Build Persona (AI)
User Persona (3 plain text sections)
```
**Key Features:**
- **Smart Truncation:** Progressive head+tail strategy (2K start + 3K end) for long conversations
- **Clean Formatting:** LLM-optimized input format (USER/ASSISTANT, no metadata noise)
- **Privacy-First:** Category-specific filtering at observation extraction
- **Adaptive Clustering:** Finds local optima with tolerance-based K-Means
- **Flexible Models:** Override per-task via prompt template metadata
- **Provenance Tracking:** Full lineage from conversation → observation → learning → cognition
- **Dual Output:** JSON (structured) + Markdown (human-readable)
---
## Configuration
All pipeline parameters live in `config/pipeline.yaml`:
```yaml
ingestion:
date_cutoff_days: 180
remove_code_blocks: true
observation_extraction:
max_concurrent: 20
batch_size: 40
learning_clustering:
target_clusters: 20
tolerance: 0.2
```
Model selection via prompt templates:
```yaml
# rebrain/prompts/templates/persona_synthesis.yaml
metadata:
model_recommendation: "gemini-2.5-flash"
```
See `config/README.md` for details.
---
## CLI Usage
```bash
# Check what's been generated
./cli.sh status
# Run individual steps
./cli.sh step1 -i data/raw/my_convos.json
./cli.sh step2 --cluster-only # Re-cluster existing observations
./cli.sh step3
# Clean outputs
./cli.sh clean --all
# Full help
./cli.sh help
```
See `scripts/pipeline/README.md` for details.
---
## Project Structure
```
rebrain/
├── rebrain/ # Core library
│ ├── core/ # GenAI client
│ ├── ingestion/ # Data parsing, truncation & formatting
│ ├── operations/ # Embedder, clusterer, synthesizer
│ ├── prompts/ # Prompt templates (YAML)
│ ├── persona/ # Persona formatting
│ └── schemas/ # Pydantic models
├── config/ # Pipeline configuration
├── scripts/pipeline/ # 5-step pipeline + CLI
├── data/ # Raw → processed → persona
└── notebooks/ # Exploration & testing
```
---
## Output
### Persona (Step 5)
**JSON** (`data/persona/persona.json`):
```json
{
"model": "gemini-2.5-flash",
"persona": {
"personal_profile": "...",
"communication_preferences": "...",
"professional_profile": "..."
}
}
```
**Markdown** (`data/persona/persona.md`):
```markdown
# User Persona Information for AI
## Personal Profile
...
## Communication Preferences
...
## Professional Profile
...
```
Copy-paste ready for system prompts!
---
## Development
```bash
# Install dev dependencies
pip install -r requirements_dev.txt
# Run with custom config
python scripts/pipeline/01_transform_filter.py --data-path ./data --config custom.yaml
# Check specific step
python scripts/pipeline/02_extract_cluster_observations.py --skip-cluster
```
---
## Documentation
- **Pipeline Details:** `scripts/pipeline/README.md`
- **Configuration:** `config/README.md`
- **Data Structure:** `data/README.md`
- **Model Override Pattern:** `MODEL_OVERRIDE_PATTERN.md`
- **Persona Builder:** `PERSONA_BUILDER_REFACTOR.md`
---
## License
MIT License - see LICENSE file
---
**Built by [Yasin Salimibeni](https://github.com/yasinsb)**
Raw data
{
"_id": null,
"home_page": null,
"name": "rebrain",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "memory, ai, graph, vector-db, persona, cognition",
"author": null,
"author_email": "Yasin Salimibeni <sbyasin@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c7/72/7003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321/rebrain-0.1.7.tar.gz",
"platform": null,
"description": "# \ud83e\udde0 Rebrain\n\n**Transform chat history into structured, personalized AI memory.**\n\nRebrain processes your ChatGPT conversations through a 5-step pipeline, extracting observations, synthesizing learnings and cognitions, then building a user persona for hyper-personalized AI interactions.\n\n---\n\n## \ud83d\ude80 Quick Start (Recommended)\n\n**Using UV - Zero Setup Required**\n\n```bash\n# 1. Install UV (one-time)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# 2. Set your API key\nexport GEMINI_API_KEY=your_key_here\n\n# 3. Process your conversations (place conversations.json in current directory)\nuvx rebrain pipeline run\n\n# 4. Start MCP server (auto-loads processed data)\nuvx rebrain mcp --port 9999\n\n# Advanced: Custom paths and config\nuvx rebrain pipeline run --input /path/to/conversations.json --data-path ./my-data\nuvx rebrain pipeline run --config custom.yaml # Custom clustering parameters\nuvx rebrain mcp --data-path ./my-data --port 9999 --user-id myproject\n```\n\n**That's it!** No Python installation, no virtual environments, no dependencies to manage.\n\nSee [INSTALL.md](INSTALL.md) for detailed installation options.\n\n---\n\n## \ud83c\udfaf For Developers\n\n```bash\n# Clone and setup\ngit clone https://github.com/yasinsb/rebrain.git\ncd rebrain\npython -m venv .venv && source .venv/bin/activate\npip install -r requirements.txt\ncp env.template .env # Add your GEMINI_API_KEY\n\n# Run pipeline (bash CLI)\nscripts/pipeline/cli.sh all\n\n# Or step by step\nscripts/pipeline/cli.sh step1 # Transform & filter\nscripts/pipeline/cli.sh step2 # Extract & cluster observations\nscripts/pipeline/cli.sh step3 # Synthesize learnings\nscripts/pipeline/cli.sh step4 # Synthesize cognitions\nscripts/pipeline/cli.sh step5 # Build persona\n\n# Load into memg-core\npython scripts/load_memg.py\n```\n\n**Output:** `data/persona/persona.md` - ready for system prompts!\n\n---\n\n## \ud83e\udd16 MCP Integration (Claude Desktop / Cursor)\n\n### HTTP Mode (Recommended for Stability)\n\n**Start the server:**\n```bash\n# Start with default user_id=\"rebrain\"\nuvx --from rebrain rebrain-mcp --data-path ./data --port 9999\n\n# Or use custom user_id for multi-user setups\nuvx --from rebrain rebrain-mcp --data-path ./data --port 9999 --user-id myproject\n\n# Using rebrain CLI (equivalent)\nuvx rebrain mcp --data-path ./data --port 9999\n```\n\n**Add to `~/.cursor/mcp.json` or Claude Desktop config:**\n```json\n{\n \"mcpServers\": {\n \"rebrain\": {\n \"url\": \"http://localhost:9999/mcp\"\n }\n }\n}\n```\n\n### Direct Mode (stdio)\n\n> \u26a0\ufe0f **Known Issue:** stdio mode has stability issues with Cursor/Claude Desktop. HTTP mode is recommended.\n\nIf you still want to try stdio mode:\n\n```json\n{\n \"mcpServers\": {\n \"rebrain\": {\n \"command\": \"uvx\",\n \"args\": [\"--from\", \"rebrain\", \"rebrain-mcp\", \"--data-path\", \"/absolute/path/to/your/data\"],\n \"env\": {\n \"GEMINI_API_KEY\": \"your_key_here\"\n }\n }\n }\n}\n```\n\n### User ID Configuration\n\nThe MCP server now supports configurable `user_id` for memory isolation:\n- **Default:** `user_id=\"rebrain\"` - used if not specified\n- **Custom:** Pass `--user-id myproject` when starting the server\n- **Multi-user:** Each user_id maintains separate memory space\n- **Agent Usage:** Agents don't need to provide user_id (uses server default)\n\n**Benefits:**\n- \ud83d\udcb0 **Process once (~$0.10-0.20), query forever for free** (local memg-core)\n- \u26a1 **Instant restarts** - database persists, no reprocessing\n- \ud83d\udd12 **100% local** - no ongoing API costs, no cloud lock-in\n\n---\n\n## Quick Start (Legacy)\n\n### 1. Setup\n\n```bash\ngit clone <repo-url>\ncd rebrain\n\n# Create virtual environment\npython -m venv .venv\nsource .venv/bin/activate # Windows: .venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Configure\ncp env.template .env\n# Edit .env: add GEMINI_API_KEY\n```\n\n### 2. Prepare Data\n\nExport your ChatGPT conversations and place the JSON file at:\n```\ndata/raw/conversations.json\n```\n\n### 3. Run Pipeline\n\n```bash\n# Full pipeline (5 steps)\nscripts/pipeline/cli.sh all\n\n# Or run individual steps\nscripts/pipeline/cli.sh step1\nscripts/pipeline/cli.sh step2\n# ... etc\n```\n\n### 4. Check Results\n\n```bash\n# View pipeline status\nscripts/pipeline/cli.sh status\n\n# Read your persona\ncat data/persona/persona.md\n```\n\n---\n\n## Pipeline Overview\n\nRebrain uses a 5-stage synthesis pipeline:\n\n```\nRaw Conversations (JSON export)\n \u2193 Step 1: Transform, Filter & Truncate\nClean Conversations (date filtered, code removed, smart truncation)\n \u2193 Step 2: Extract & Cluster Observations (AI + K-Means)\nClustered Observations (~40 clusters by category)\n \u2193 Step 3: Synthesize Learnings (AI + K-Means)\nClustered Learnings (~10 clusters)\n \u2193 Step 4: Synthesize Cognitions (AI)\nHigh-Level Cognitions (~20 patterns)\n \u2193 Step 5: Build Persona (AI)\nUser Persona (3 plain text sections)\n```\n\n**Key Features:**\n- **Smart Truncation:** Progressive head+tail strategy (2K start + 3K end) for long conversations\n- **Clean Formatting:** LLM-optimized input format (USER/ASSISTANT, no metadata noise)\n- **Privacy-First:** Category-specific filtering at observation extraction\n- **Adaptive Clustering:** Finds local optima with tolerance-based K-Means\n- **Flexible Models:** Override per-task via prompt template metadata\n- **Provenance Tracking:** Full lineage from conversation \u2192 observation \u2192 learning \u2192 cognition\n- **Dual Output:** JSON (structured) + Markdown (human-readable)\n\n---\n\n## Configuration\n\nAll pipeline parameters live in `config/pipeline.yaml`:\n\n```yaml\ningestion:\n date_cutoff_days: 180\n remove_code_blocks: true\n\nobservation_extraction:\n max_concurrent: 20\n batch_size: 40\n\nlearning_clustering:\n target_clusters: 20\n tolerance: 0.2\n```\n\nModel selection via prompt templates:\n```yaml\n# rebrain/prompts/templates/persona_synthesis.yaml\nmetadata:\n model_recommendation: \"gemini-2.5-flash\"\n```\n\nSee `config/README.md` for details.\n\n---\n\n## CLI Usage\n\n```bash\n# Check what's been generated\n./cli.sh status\n\n# Run individual steps\n./cli.sh step1 -i data/raw/my_convos.json\n./cli.sh step2 --cluster-only # Re-cluster existing observations\n./cli.sh step3\n\n# Clean outputs\n./cli.sh clean --all\n\n# Full help\n./cli.sh help\n```\n\nSee `scripts/pipeline/README.md` for details.\n\n---\n\n## Project Structure\n\n```\nrebrain/\n\u251c\u2500\u2500 rebrain/ # Core library\n\u2502 \u251c\u2500\u2500 core/ # GenAI client\n\u2502 \u251c\u2500\u2500 ingestion/ # Data parsing, truncation & formatting\n\u2502 \u251c\u2500\u2500 operations/ # Embedder, clusterer, synthesizer\n\u2502 \u251c\u2500\u2500 prompts/ # Prompt templates (YAML)\n\u2502 \u251c\u2500\u2500 persona/ # Persona formatting\n\u2502 \u2514\u2500\u2500 schemas/ # Pydantic models\n\u251c\u2500\u2500 config/ # Pipeline configuration\n\u251c\u2500\u2500 scripts/pipeline/ # 5-step pipeline + CLI\n\u251c\u2500\u2500 data/ # Raw \u2192 processed \u2192 persona\n\u2514\u2500\u2500 notebooks/ # Exploration & testing\n```\n\n---\n\n## Output\n\n### Persona (Step 5)\n\n**JSON** (`data/persona/persona.json`):\n```json\n{\n \"model\": \"gemini-2.5-flash\",\n \"persona\": {\n \"personal_profile\": \"...\",\n \"communication_preferences\": \"...\",\n \"professional_profile\": \"...\"\n }\n}\n```\n\n**Markdown** (`data/persona/persona.md`):\n```markdown\n# User Persona Information for AI\n\n## Personal Profile\n...\n\n## Communication Preferences\n...\n\n## Professional Profile\n...\n```\n\nCopy-paste ready for system prompts!\n\n---\n\n## Development\n\n```bash\n# Install dev dependencies\npip install -r requirements_dev.txt\n\n# Run with custom config\npython scripts/pipeline/01_transform_filter.py --data-path ./data --config custom.yaml\n\n# Check specific step\npython scripts/pipeline/02_extract_cluster_observations.py --skip-cluster\n```\n\n---\n\n## Documentation\n\n- **Pipeline Details:** `scripts/pipeline/README.md`\n- **Configuration:** `config/README.md`\n- **Data Structure:** `data/README.md`\n- **Model Override Pattern:** `MODEL_OVERRIDE_PATTERN.md`\n- **Persona Builder:** `PERSONA_BUILDER_REFACTOR.md`\n\n---\n\n## License\n\nMIT License - see LICENSE file\n\n---\n\n**Built by [Yasin Salimibeni](https://github.com/yasinsb)**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Transform chat history into structured, navigable memory graphs",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/yasinsb/rebrain",
"Issues": "https://github.com/yasinsb/rebrain/issues",
"Repository": "https://github.com/yasinsb/rebrain"
},
"split_keywords": [
"memory",
" ai",
" graph",
" vector-db",
" persona",
" cognition"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "058554866f357f20063dd4373e6bb0262b7aa51154fc4f118befbf82665073c9",
"md5": "2f427e31229f1f12ce0d755567371fb5",
"sha256": "76bcf5973dd30a73d7e6f99f50194c12f8c2bbc72caf7a0a6458ec234e9231ba"
},
"downloads": -1,
"filename": "rebrain-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f427e31229f1f12ce0d755567371fb5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 110603,
"upload_time": "2025-10-19T03:41:04",
"upload_time_iso_8601": "2025-10-19T03:41:04.040777Z",
"url": "https://files.pythonhosted.org/packages/05/85/54866f357f20063dd4373e6bb0262b7aa51154fc4f118befbf82665073c9/rebrain-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c7727003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321",
"md5": "3db438115d861d5f8b3ff1e4569d1b15",
"sha256": "076254e39d3a8418439b3704f301ce19f2f2592affbba8e3170e05f1c7fead2b"
},
"downloads": -1,
"filename": "rebrain-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "3db438115d861d5f8b3ff1e4569d1b15",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 200869,
"upload_time": "2025-10-19T03:41:05",
"upload_time_iso_8601": "2025-10-19T03:41:05.729124Z",
"url": "https://files.pythonhosted.org/packages/c7/72/7003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321/rebrain-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-19 03:41:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yasinsb",
"github_project": "rebrain",
"github_not_found": true,
"lcname": "rebrain"
}