rebrain


Namerebrain JSON
Version 0.1.7 PyPI version JSON
download
home_pageNone
SummaryTransform chat history into structured, navigable memory graphs
upload_time2025-10-19 03:41:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords memory ai graph vector-db persona cognition
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🧠 Rebrain

**Transform chat history into structured, personalized AI memory.**

Rebrain processes your ChatGPT conversations through a 5-step pipeline, extracting observations, synthesizing learnings and cognitions, then building a user persona for hyper-personalized AI interactions.

---

## 🚀 Quick Start (Recommended)

**Using UV - Zero Setup Required**

```bash
# 1. Install UV (one-time)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Set your API key
export GEMINI_API_KEY=your_key_here

# 3. Process your conversations (place conversations.json in current directory)
uvx rebrain pipeline run

# 4. Start MCP server (auto-loads processed data)
uvx rebrain mcp --port 9999

# Advanced: Custom paths and config
uvx rebrain pipeline run --input /path/to/conversations.json --data-path ./my-data
uvx rebrain pipeline run --config custom.yaml  # Custom clustering parameters
uvx rebrain mcp --data-path ./my-data --port 9999 --user-id myproject
```

**That's it!** No Python installation, no virtual environments, no dependencies to manage.

See [INSTALL.md](INSTALL.md) for detailed installation options.

---

## 🎯 For Developers

```bash
# Clone and setup
git clone https://github.com/yasinsb/rebrain.git
cd rebrain
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp env.template .env  # Add your GEMINI_API_KEY

# Run pipeline (bash CLI)
scripts/pipeline/cli.sh all

# Or step by step
scripts/pipeline/cli.sh step1  # Transform & filter
scripts/pipeline/cli.sh step2  # Extract & cluster observations
scripts/pipeline/cli.sh step3  # Synthesize learnings
scripts/pipeline/cli.sh step4  # Synthesize cognitions
scripts/pipeline/cli.sh step5  # Build persona

# Load into memg-core
python scripts/load_memg.py
```

**Output:** `data/persona/persona.md` - ready for system prompts!

---

## 🤖 MCP Integration (Claude Desktop / Cursor)

### HTTP Mode (Recommended for Stability)

**Start the server:**
```bash
# Start with default user_id="rebrain"
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999

# Or use custom user_id for multi-user setups
uvx --from rebrain rebrain-mcp --data-path ./data --port 9999 --user-id myproject

# Using rebrain CLI (equivalent)
uvx rebrain mcp --data-path ./data --port 9999
```

**Add to `~/.cursor/mcp.json` or Claude Desktop config:**
```json
{
  "mcpServers": {
    "rebrain": {
      "url": "http://localhost:9999/mcp"
    }
  }
}
```

### Direct Mode (stdio)

> ⚠️ **Known Issue:** stdio mode has stability issues with Cursor/Claude Desktop. HTTP mode is recommended.

If you still want to try stdio mode:

```json
{
  "mcpServers": {
    "rebrain": {
      "command": "uvx",
      "args": ["--from", "rebrain", "rebrain-mcp", "--data-path", "/absolute/path/to/your/data"],
      "env": {
        "GEMINI_API_KEY": "your_key_here"
      }
    }
  }
}
```

### User ID Configuration

The MCP server now supports configurable `user_id` for memory isolation:
- **Default:** `user_id="rebrain"` - used if not specified
- **Custom:** Pass `--user-id myproject` when starting the server
- **Multi-user:** Each user_id maintains separate memory space
- **Agent Usage:** Agents don't need to provide user_id (uses server default)

**Benefits:**
- 💰 **Process once (~$0.10-0.20), query forever for free** (local memg-core)
- ⚡ **Instant restarts** - database persists, no reprocessing
- 🔒 **100% local** - no ongoing API costs, no cloud lock-in

---

## Quick Start (Legacy)

### 1. Setup

```bash
git clone <repo-url>
cd rebrain

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure
cp env.template .env
# Edit .env: add GEMINI_API_KEY
```

### 2. Prepare Data

Export your ChatGPT conversations and place the JSON file at:
```
data/raw/conversations.json
```

### 3. Run Pipeline

```bash
# Full pipeline (5 steps)
scripts/pipeline/cli.sh all

# Or run individual steps
scripts/pipeline/cli.sh step1
scripts/pipeline/cli.sh step2
# ... etc
```

### 4. Check Results

```bash
# View pipeline status
scripts/pipeline/cli.sh status

# Read your persona
cat data/persona/persona.md
```

---

## Pipeline Overview

Rebrain uses a 5-stage synthesis pipeline:

```
Raw Conversations (JSON export)
    ↓ Step 1: Transform, Filter & Truncate
Clean Conversations (date filtered, code removed, smart truncation)
    ↓ Step 2: Extract & Cluster Observations (AI + K-Means)
Clustered Observations (~40 clusters by category)
    ↓ Step 3: Synthesize Learnings (AI + K-Means)
Clustered Learnings (~10 clusters)
    ↓ Step 4: Synthesize Cognitions (AI)
High-Level Cognitions (~20 patterns)
    ↓ Step 5: Build Persona (AI)
User Persona (3 plain text sections)
```

**Key Features:**
- **Smart Truncation:** Progressive head+tail strategy (2K start + 3K end) for long conversations
- **Clean Formatting:** LLM-optimized input format (USER/ASSISTANT, no metadata noise)
- **Privacy-First:** Category-specific filtering at observation extraction
- **Adaptive Clustering:** Finds local optima with tolerance-based K-Means
- **Flexible Models:** Override per-task via prompt template metadata
- **Provenance Tracking:** Full lineage from conversation → observation → learning → cognition
- **Dual Output:** JSON (structured) + Markdown (human-readable)

---

## Configuration

All pipeline parameters live in `config/pipeline.yaml`:

```yaml
ingestion:
  date_cutoff_days: 180
  remove_code_blocks: true

observation_extraction:
  max_concurrent: 20
  batch_size: 40

learning_clustering:
  target_clusters: 20
  tolerance: 0.2
```

Model selection via prompt templates:
```yaml
# rebrain/prompts/templates/persona_synthesis.yaml
metadata:
  model_recommendation: "gemini-2.5-flash"
```

See `config/README.md` for details.

---

## CLI Usage

```bash
# Check what's been generated
./cli.sh status

# Run individual steps
./cli.sh step1 -i data/raw/my_convos.json
./cli.sh step2 --cluster-only  # Re-cluster existing observations
./cli.sh step3

# Clean outputs
./cli.sh clean --all

# Full help
./cli.sh help
```

See `scripts/pipeline/README.md` for details.

---

## Project Structure

```
rebrain/
├── rebrain/              # Core library
│   ├── core/            # GenAI client
│   ├── ingestion/       # Data parsing, truncation & formatting
│   ├── operations/      # Embedder, clusterer, synthesizer
│   ├── prompts/         # Prompt templates (YAML)
│   ├── persona/         # Persona formatting
│   └── schemas/         # Pydantic models
├── config/              # Pipeline configuration
├── scripts/pipeline/    # 5-step pipeline + CLI
├── data/               # Raw → processed → persona
└── notebooks/          # Exploration & testing
```

---

## Output

### Persona (Step 5)

**JSON** (`data/persona/persona.json`):
```json
{
  "model": "gemini-2.5-flash",
  "persona": {
    "personal_profile": "...",
    "communication_preferences": "...",
    "professional_profile": "..."
  }
}
```

**Markdown** (`data/persona/persona.md`):
```markdown
# User Persona Information for AI

## Personal Profile
...

## Communication Preferences
...

## Professional Profile
...
```

Copy-paste ready for system prompts!

---

## Development

```bash
# Install dev dependencies
pip install -r requirements_dev.txt

# Run with custom config
python scripts/pipeline/01_transform_filter.py --data-path ./data --config custom.yaml

# Check specific step
python scripts/pipeline/02_extract_cluster_observations.py --skip-cluster
```

---

## Documentation

- **Pipeline Details:** `scripts/pipeline/README.md`
- **Configuration:** `config/README.md`
- **Data Structure:** `data/README.md`
- **Model Override Pattern:** `MODEL_OVERRIDE_PATTERN.md`
- **Persona Builder:** `PERSONA_BUILDER_REFACTOR.md`

---

## License

MIT License - see LICENSE file

---

**Built by [Yasin Salimibeni](https://github.com/yasinsb)**

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rebrain",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "memory, ai, graph, vector-db, persona, cognition",
    "author": null,
    "author_email": "Yasin Salimibeni <sbyasin@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c7/72/7003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321/rebrain-0.1.7.tar.gz",
    "platform": null,
    "description": "# \ud83e\udde0 Rebrain\n\n**Transform chat history into structured, personalized AI memory.**\n\nRebrain processes your ChatGPT conversations through a 5-step pipeline, extracting observations, synthesizing learnings and cognitions, then building a user persona for hyper-personalized AI interactions.\n\n---\n\n## \ud83d\ude80 Quick Start (Recommended)\n\n**Using UV - Zero Setup Required**\n\n```bash\n# 1. Install UV (one-time)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# 2. Set your API key\nexport GEMINI_API_KEY=your_key_here\n\n# 3. Process your conversations (place conversations.json in current directory)\nuvx rebrain pipeline run\n\n# 4. Start MCP server (auto-loads processed data)\nuvx rebrain mcp --port 9999\n\n# Advanced: Custom paths and config\nuvx rebrain pipeline run --input /path/to/conversations.json --data-path ./my-data\nuvx rebrain pipeline run --config custom.yaml  # Custom clustering parameters\nuvx rebrain mcp --data-path ./my-data --port 9999 --user-id myproject\n```\n\n**That's it!** No Python installation, no virtual environments, no dependencies to manage.\n\nSee [INSTALL.md](INSTALL.md) for detailed installation options.\n\n---\n\n## \ud83c\udfaf For Developers\n\n```bash\n# Clone and setup\ngit clone https://github.com/yasinsb/rebrain.git\ncd rebrain\npython -m venv .venv && source .venv/bin/activate\npip install -r requirements.txt\ncp env.template .env  # Add your GEMINI_API_KEY\n\n# Run pipeline (bash CLI)\nscripts/pipeline/cli.sh all\n\n# Or step by step\nscripts/pipeline/cli.sh step1  # Transform & filter\nscripts/pipeline/cli.sh step2  # Extract & cluster observations\nscripts/pipeline/cli.sh step3  # Synthesize learnings\nscripts/pipeline/cli.sh step4  # Synthesize cognitions\nscripts/pipeline/cli.sh step5  # Build persona\n\n# Load into memg-core\npython scripts/load_memg.py\n```\n\n**Output:** `data/persona/persona.md` - ready for system prompts!\n\n---\n\n## \ud83e\udd16 MCP Integration (Claude Desktop / Cursor)\n\n### HTTP Mode (Recommended for Stability)\n\n**Start the server:**\n```bash\n# Start with default user_id=\"rebrain\"\nuvx --from rebrain rebrain-mcp --data-path ./data --port 9999\n\n# Or use custom user_id for multi-user setups\nuvx --from rebrain rebrain-mcp --data-path ./data --port 9999 --user-id myproject\n\n# Using rebrain CLI (equivalent)\nuvx rebrain mcp --data-path ./data --port 9999\n```\n\n**Add to `~/.cursor/mcp.json` or Claude Desktop config:**\n```json\n{\n  \"mcpServers\": {\n    \"rebrain\": {\n      \"url\": \"http://localhost:9999/mcp\"\n    }\n  }\n}\n```\n\n### Direct Mode (stdio)\n\n> \u26a0\ufe0f **Known Issue:** stdio mode has stability issues with Cursor/Claude Desktop. HTTP mode is recommended.\n\nIf you still want to try stdio mode:\n\n```json\n{\n  \"mcpServers\": {\n    \"rebrain\": {\n      \"command\": \"uvx\",\n      \"args\": [\"--from\", \"rebrain\", \"rebrain-mcp\", \"--data-path\", \"/absolute/path/to/your/data\"],\n      \"env\": {\n        \"GEMINI_API_KEY\": \"your_key_here\"\n      }\n    }\n  }\n}\n```\n\n### User ID Configuration\n\nThe MCP server now supports configurable `user_id` for memory isolation:\n- **Default:** `user_id=\"rebrain\"` - used if not specified\n- **Custom:** Pass `--user-id myproject` when starting the server\n- **Multi-user:** Each user_id maintains separate memory space\n- **Agent Usage:** Agents don't need to provide user_id (uses server default)\n\n**Benefits:**\n- \ud83d\udcb0 **Process once (~$0.10-0.20), query forever for free** (local memg-core)\n- \u26a1 **Instant restarts** - database persists, no reprocessing\n- \ud83d\udd12 **100% local** - no ongoing API costs, no cloud lock-in\n\n---\n\n## Quick Start (Legacy)\n\n### 1. Setup\n\n```bash\ngit clone <repo-url>\ncd rebrain\n\n# Create virtual environment\npython -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Configure\ncp env.template .env\n# Edit .env: add GEMINI_API_KEY\n```\n\n### 2. Prepare Data\n\nExport your ChatGPT conversations and place the JSON file at:\n```\ndata/raw/conversations.json\n```\n\n### 3. Run Pipeline\n\n```bash\n# Full pipeline (5 steps)\nscripts/pipeline/cli.sh all\n\n# Or run individual steps\nscripts/pipeline/cli.sh step1\nscripts/pipeline/cli.sh step2\n# ... etc\n```\n\n### 4. Check Results\n\n```bash\n# View pipeline status\nscripts/pipeline/cli.sh status\n\n# Read your persona\ncat data/persona/persona.md\n```\n\n---\n\n## Pipeline Overview\n\nRebrain uses a 5-stage synthesis pipeline:\n\n```\nRaw Conversations (JSON export)\n    \u2193 Step 1: Transform, Filter & Truncate\nClean Conversations (date filtered, code removed, smart truncation)\n    \u2193 Step 2: Extract & Cluster Observations (AI + K-Means)\nClustered Observations (~40 clusters by category)\n    \u2193 Step 3: Synthesize Learnings (AI + K-Means)\nClustered Learnings (~10 clusters)\n    \u2193 Step 4: Synthesize Cognitions (AI)\nHigh-Level Cognitions (~20 patterns)\n    \u2193 Step 5: Build Persona (AI)\nUser Persona (3 plain text sections)\n```\n\n**Key Features:**\n- **Smart Truncation:** Progressive head+tail strategy (2K start + 3K end) for long conversations\n- **Clean Formatting:** LLM-optimized input format (USER/ASSISTANT, no metadata noise)\n- **Privacy-First:** Category-specific filtering at observation extraction\n- **Adaptive Clustering:** Finds local optima with tolerance-based K-Means\n- **Flexible Models:** Override per-task via prompt template metadata\n- **Provenance Tracking:** Full lineage from conversation \u2192 observation \u2192 learning \u2192 cognition\n- **Dual Output:** JSON (structured) + Markdown (human-readable)\n\n---\n\n## Configuration\n\nAll pipeline parameters live in `config/pipeline.yaml`:\n\n```yaml\ningestion:\n  date_cutoff_days: 180\n  remove_code_blocks: true\n\nobservation_extraction:\n  max_concurrent: 20\n  batch_size: 40\n\nlearning_clustering:\n  target_clusters: 20\n  tolerance: 0.2\n```\n\nModel selection via prompt templates:\n```yaml\n# rebrain/prompts/templates/persona_synthesis.yaml\nmetadata:\n  model_recommendation: \"gemini-2.5-flash\"\n```\n\nSee `config/README.md` for details.\n\n---\n\n## CLI Usage\n\n```bash\n# Check what's been generated\n./cli.sh status\n\n# Run individual steps\n./cli.sh step1 -i data/raw/my_convos.json\n./cli.sh step2 --cluster-only  # Re-cluster existing observations\n./cli.sh step3\n\n# Clean outputs\n./cli.sh clean --all\n\n# Full help\n./cli.sh help\n```\n\nSee `scripts/pipeline/README.md` for details.\n\n---\n\n## Project Structure\n\n```\nrebrain/\n\u251c\u2500\u2500 rebrain/              # Core library\n\u2502   \u251c\u2500\u2500 core/            # GenAI client\n\u2502   \u251c\u2500\u2500 ingestion/       # Data parsing, truncation & formatting\n\u2502   \u251c\u2500\u2500 operations/      # Embedder, clusterer, synthesizer\n\u2502   \u251c\u2500\u2500 prompts/         # Prompt templates (YAML)\n\u2502   \u251c\u2500\u2500 persona/         # Persona formatting\n\u2502   \u2514\u2500\u2500 schemas/         # Pydantic models\n\u251c\u2500\u2500 config/              # Pipeline configuration\n\u251c\u2500\u2500 scripts/pipeline/    # 5-step pipeline + CLI\n\u251c\u2500\u2500 data/               # Raw \u2192 processed \u2192 persona\n\u2514\u2500\u2500 notebooks/          # Exploration & testing\n```\n\n---\n\n## Output\n\n### Persona (Step 5)\n\n**JSON** (`data/persona/persona.json`):\n```json\n{\n  \"model\": \"gemini-2.5-flash\",\n  \"persona\": {\n    \"personal_profile\": \"...\",\n    \"communication_preferences\": \"...\",\n    \"professional_profile\": \"...\"\n  }\n}\n```\n\n**Markdown** (`data/persona/persona.md`):\n```markdown\n# User Persona Information for AI\n\n## Personal Profile\n...\n\n## Communication Preferences\n...\n\n## Professional Profile\n...\n```\n\nCopy-paste ready for system prompts!\n\n---\n\n## Development\n\n```bash\n# Install dev dependencies\npip install -r requirements_dev.txt\n\n# Run with custom config\npython scripts/pipeline/01_transform_filter.py --data-path ./data --config custom.yaml\n\n# Check specific step\npython scripts/pipeline/02_extract_cluster_observations.py --skip-cluster\n```\n\n---\n\n## Documentation\n\n- **Pipeline Details:** `scripts/pipeline/README.md`\n- **Configuration:** `config/README.md`\n- **Data Structure:** `data/README.md`\n- **Model Override Pattern:** `MODEL_OVERRIDE_PATTERN.md`\n- **Persona Builder:** `PERSONA_BUILDER_REFACTOR.md`\n\n---\n\n## License\n\nMIT License - see LICENSE file\n\n---\n\n**Built by [Yasin Salimibeni](https://github.com/yasinsb)**\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Transform chat history into structured, navigable memory graphs",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/yasinsb/rebrain",
        "Issues": "https://github.com/yasinsb/rebrain/issues",
        "Repository": "https://github.com/yasinsb/rebrain"
    },
    "split_keywords": [
        "memory",
        " ai",
        " graph",
        " vector-db",
        " persona",
        " cognition"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "058554866f357f20063dd4373e6bb0262b7aa51154fc4f118befbf82665073c9",
                "md5": "2f427e31229f1f12ce0d755567371fb5",
                "sha256": "76bcf5973dd30a73d7e6f99f50194c12f8c2bbc72caf7a0a6458ec234e9231ba"
            },
            "downloads": -1,
            "filename": "rebrain-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f427e31229f1f12ce0d755567371fb5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 110603,
            "upload_time": "2025-10-19T03:41:04",
            "upload_time_iso_8601": "2025-10-19T03:41:04.040777Z",
            "url": "https://files.pythonhosted.org/packages/05/85/54866f357f20063dd4373e6bb0262b7aa51154fc4f118befbf82665073c9/rebrain-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c7727003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321",
                "md5": "3db438115d861d5f8b3ff1e4569d1b15",
                "sha256": "076254e39d3a8418439b3704f301ce19f2f2592affbba8e3170e05f1c7fead2b"
            },
            "downloads": -1,
            "filename": "rebrain-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "3db438115d861d5f8b3ff1e4569d1b15",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 200869,
            "upload_time": "2025-10-19T03:41:05",
            "upload_time_iso_8601": "2025-10-19T03:41:05.729124Z",
            "url": "https://files.pythonhosted.org/packages/c7/72/7003380b4ee87021842fe2c6d20c5b00857e9f5b3b1bd3b2d1c5bf2d0321/rebrain-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-19 03:41:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yasinsb",
    "github_project": "rebrain",
    "github_not_found": true,
    "lcname": "rebrain"
}
        
Elapsed time: 0.59396s