nancy-brain

Name	nancy-brain JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	Turn any GitHub repository into a searchable knowledge base for AI agents
upload_time	2025-08-24 20:37:07
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	ai rag embeddings github knowledge-base research search
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            # Nancy Brain

**Turn any GitHub repository into a searchable knowledge base for AI agents.**

Load the complete source code, documentation, examples, and notebooks from any package you're working with. Nancy Brain gives AI assistants instant access to:

- **Full source code** - actual Python classes, methods, implementation details
- **Live documentation** - tutorials, API docs, usage examples  
- **Real examples** - Jupyter notebooks, test cases, configuration files
- **Smart weighting** - boost important docs, learning persists across sessions

The AI can now answer questions like "How do I initialize this class?" or "Show me an example of fitting a light curve" with actual code from the repositories you care about.

## 🚀 Quick Start

```bash
# Install anywhere
pip install nancy-brain

# Initialize a new project
nancy-brain init my-ai-project
cd my-ai-project

# Add some repositories  
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git

# Build the knowledge base
nancy-brain build

# Search it!
nancy-brain search "machine learning algorithms"

# Or launch the web interface
nancy-brain ui
```

## 🌐 Web Admin Interface

Launch the visual admin interface for easy knowledge base management:

```bash
nancy-brain ui
```

Features:
- **🔍 Live Search** - Test your knowledge base with instant results
- **📚 Repository Management** - Add/remove GitHub repos with visual forms
- **📄 Article Management** - Add/remove PDF articles with visual forms
- **🏗️ Build Control** - Trigger knowledge base builds with options
- **📊 System Status** - Check embeddings, configuration, and health

Perfect for non-technical users and rapid prototyping!

## 🖥️ Command Line Interface

```bash
nancy-brain init <project>        # Initialize new project
nancy-brain add-repo <url>        # Add GitHub repositories  
nancy-brain add-article <url> <name>  # Add PDF articles
nancy-brain build                 # Build knowledge base
nancy-brain search "query"        # Search knowledge base
nancy-brain serve                 # Start HTTP API server
nancy-brain ui                    # Launch web admin interface
```

## Technical Architecture

A lightweight Retrieval-Augmented Generation (RAG) knowledge base with:
- Embedding + search pipeline (txtai / FAISS based)
- HTTP API connector (FastAPI)
- Model Context Protocol (MCP) server connector (tools for search / retrieve / tree / weight)
- Dynamic weighting system (extension/path weights + runtime doc preferences)

Designed to power AI assistants on Slack, IDEs, Claude Desktop, custom GPTs, and any MCP-capable client.

---
## 1. Installation & Quick Setup

### For Users (Recommended)
```bash
# Install the package
pip install nancy-brain

# Initialize a new project
nancy-brain init my-knowledge-base
cd my-knowledge-base

# Add repositories and build
nancy-brain add-repo https://github.com/your-org/repo.git
nancy-brain add-article "https://arxiv.org/pdf/paper.pdf" "paper_name" --description "Important paper"
nancy-brain build

# Launch web interface
nancy-brain ui
```

### For Developers
```bash
# Clone and install in development mode
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"

# Test installation
pytest -q
nancy-brain --help
```

---
## 2. Project Layout (Core Parts)
```
nancy_brain/                    # Main Python package
├── cli.py                      # Command line interface
├── admin_ui.py                 # Streamlit web admin interface
└── __init__.py                 # Package initialization

connectors/http_api/app.py      # FastAPI app
connectors/mcp_server/          # MCP server implementation
rag_core/                       # Core service, search, registry, store, types
scripts/                        # KB build & management scripts
config/repositories.yml         # Source repository list (input KB)
config/weights.yaml             # Extension + path weighting config
config/model_weights.yaml       # (Optional) static per-doc multipliers
```

---
## 3. Configuration

### 3.1 Repositories (`config/repositories.yml`)
Structure (categories map to lists of repos):
```yaml
<category_name>:
  - name: repoA
    url: https://github.com/org/repoA.git
  - name: repoB
    url: https://github.com/org/repoB.git
```
Categories become path prefixes inside the knowledge base (e.g. `cat1/repoA/...`).

### 3.2 Weight Config (`config/weights.yaml`)
- `extensions`: base multipliers by file extension (.py, .md, etc.)
- `path_includes`: if substring appears in doc_id, multiplier is applied multiplicatively.

### 3.3 Model Weights (`config/model_weights.yaml`)
Optional static per-document multipliers (legacy / seed). Runtime updates via `/weight` endpoint or MCP `set_weight` tool override or augment in-memory weights.

### 3.4 Environment Variables
| Var | Purpose | Default |
|-----|---------|---------|
| `USE_DUAL_EMBEDDING` | Enable dual (general + code) embedding scoring | true |
| `CODE_EMBEDDING_MODEL` | Model name for code index (if dual) | microsoft/codebert-base |
| `KMP_DUPLICATE_LIB_OK` | Set to TRUE to avoid OpenMP macOS clash | TRUE |

---
## 4. Building the Knowledge Base
Embeddings must be built before meaningful search.

### Using the CLI (Recommended)
```bash
# Basic build (repositories only)
nancy-brain build

# Build with PDF articles (if configured)
nancy-brain build --articles-config config/articles.yml

# Force update all repositories
nancy-brain build --force-update

# Or use the web interface
nancy-brain ui  # Go to "Build Knowledge Base" page
```

### Using the Python Script Directly
```bash
conda activate nancy-brain
cd src/nancy-brain
# Basic build (repositories only)
python scripts/build_knowledge_base.py \
  --config config/repositories.yml \
  --embeddings-path knowledge_base/embeddings

# Full build including optional PDF articles (if config/articles.yml exists)
python scripts/build_knowledge_base.py \
  --config config/repositories.yml \
  --articles-config config/articles.yml \
  --base-path knowledge_base/raw \
  --embeddings-path knowledge_base/embeddings \
  --force-update \
  --dirty
# You can run without the dirty tag to automatically 
# remove source material after indexing is complete
```
Run `python scripts/build_knowledge_base.py -h` for all options.

### 4.1 PDF Articles (Optional Quick Setup)
1. Create `config/articles.yml` (example):
```yaml
journal_articles:
  - name: Paczynski_1986_ApJ_304_1
    url: https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF
    description: Paczynski (1986) – Gravitational microlensing
```
2. Install Java (for Tika PDF extraction) – macOS:
```bash
brew install openjdk
export JAVA_HOME="/opt/homebrew/opt/openjdk"
export PATH="$JAVA_HOME/bin:$PATH"
```
3. (Optional fallback only) Install lightweight PDF libs if you skip Java:
```bash
pip install PyPDF2 pdfplumber
```
4. Build with articles (explicit):
```bash
python scripts/build_knowledge_base.py --config config/repositories.yml --articles-config config/articles.yml
```
5. Keep raw PDFs for inspection: add `--dirty`.

Notes:
- If Java/Tika not available, script attempts fallback extraction (needs PyPDF2/pdfplumber or fitz).
- Cleanups remove raw PDFs unless `--dirty` supplied.
- Article docs are indexed under `journal_articles/<category>/<name>`.

Key flags:
- `--config` path to repositories YAML (was --repositories in older docs)
- `--articles-config` optional PDF articles YAML
- `--base-path` where raw repos/PDFs live (default knowledge_base/raw)
- `--embeddings-path` output index directory
- `--force-update` re-pull repos / re-download PDFs
- `--category <name>` limit to one category
- `--dry-run` show actions without performing
- `--dirty` keep raw sources (skip cleanup)

This will:
1. Clone / update listed repos under `knowledge_base/raw/<category>/<repo>`
2. (Optionally) download PDFs into category directories
3. Convert notebooks (*.ipynb -> *.nb.txt) if nb4llm available
4. Extract and normalize text + (optionally) PDF text
5. Build / update embeddings index at `knowledge_base/embeddings` (and `code_index` if dual embeddings enabled)

Re-run when repositories or articles change.

---
## 5. Running Services

### Web Admin Interface (Recommended for Getting Started)
```bash
nancy-brain ui
# Opens Streamlit interface at http://localhost:8501
# Features: search, repo management, build control, status
```

### HTTP API Server
```bash
# Using CLI
nancy-brain serve

# Or directly with uvicorn
uvicorn connectors.http_api.app:app --host 0.0.0.0 --port 8000
```

### MCP Server (for AI Assistants)
```bash
# Run MCP stdio server
python run_mcp_server.py
```

Initialize service programmatically (example pattern):
```python
from pathlib import Path
from connectors.http_api.app import initialize_rag_service
initialize_rag_service(
    config_path=Path('config/repositories.yml'),
    embeddings_path=Path('knowledge_base/embeddings'),
    weights_path=Path('config/weights.yaml'),
    use_dual_embedding=True
)
```
The FastAPI dependency layer will then serve requests.

### Command Line Search
```bash
# Quick search from command line
nancy-brain search "machine learning algorithms" --limit 5

# Search with custom paths
nancy-brain search "neural networks" \
  --embeddings-path custom/embeddings \
  --config custom/repositories.yml
```

### 5.1 Endpoints (Bearer auth placeholder)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Service status |
| GET | `/version` | Index / build meta |
| GET | `/search?query=...&limit=N` | Search documents |
| POST | `/retrieve` | Retrieve passage (doc_id + line range) |
| POST | `/retrieve/batch` | Batch retrieve |
| GET | `/tree?prefix=...` | List KB tree |
| POST | `/weight` | Set runtime doc weight |

Example:
```bash
curl -H "Authorization: Bearer TEST" 'http://localhost:8000/search?query=light%20curve&limit=5'
```

Set a document weight (boost factor 0.5–2.0 typical):
```bash
curl -X POST -H 'Authorization: Bearer TEST' \
  -H 'Content-Type: application/json' \
  -d '{"doc_id":"cat1/repoA/path/file.py","multiplier":2.0}' \
  http://localhost:8000/weight
```

---
## 6. MCP Server
Run the MCP stdio server:
```bash
python run_mcp_server.py
```
Tools exposed (operation names):
- `search` (query, limit)
- `retrieve` (doc_id, start, end)
- `retrieve_batch`
- `tree` (prefix, depth)
- `set_weight` (doc_id, multiplier)
- `status` / `version`

### 6.1 VS Code Integration
1. Install a Model Context Protocol client extension (e.g. "MCP Explorer" or equivalent).
2. Add a server entry pointing to the script, stdio transport. Example config snippet:
```
{
  "mcpServers": {
    "nancy-brain": {
      "command": "python",
      "args": ["/absolute/path/to/src/nancy-brain/run_mcp_server.py"],
      "env": {
        "PYTHONPATH": "/absolute/path/to/src/nancy-brain" 
      }
    }
  }
}
```

*Specific mamba environment example:*

```
{
	"servers": {
		"nancy-brain": {
			"type": "stdio",
			"command": "/Users/malpas.1/.local/share/mamba/envs/nancy-brain/bin/python",
			"args": [
				"/Users/malpas.1/Code/slack-bot/src/nancy-brain/run_mcp_server.py"
			],
			"env": {
				"PYTHONPATH": "/Users/malpas.1/Code/slack-bot/src/nancy-brain",
				"KMP_DUPLICATE_LIB_OK": "TRUE"
			}
		}
	},
	"inputs": []
}
```

3. Reload VS Code. The provider should list the tools; invoke `search` to test.

### 6.2 Claude Desktop
Claude supports MCP config in its settings file. Add an entry similar to above (command + args). Restart Claude Desktop; tools appear in the prompt tools menu.

---
## 7. Use Cases & Examples

### For Researchers
```bash
# Add astronomy packages
nancy-brain add-repo https://github.com/astropy/astropy.git
nancy-brain add-repo https://github.com/rpoleski/MulensModel.git

# Add key research papers
nancy-brain add-article \
  "https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF" \
  "Paczynski_1986_microlensing" \
  --category "foundational_papers" \
  --description "Paczynski (1986) - Gravitational microlensing by the galactic halo"

nancy-brain build

# AI can now answer: "How do I model a microlensing event?"
nancy-brain search "microlensing model fit"
```

### For ML Engineers  
```bash
# Add ML frameworks
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git
nancy-brain add-repo https://github.com/pytorch/pytorch.git
nancy-brain build

# AI can now answer: "Show me gradient descent implementation"
nancy-brain search "gradient descent optimizer"
```

### For Teams
```bash
# Launch web interface for non-technical users
nancy-brain ui
# Point team to http://localhost:8501
# They can search, add repos, manage articles, trigger builds visually
# Repository Management tab: Add GitHub repos
# Articles tab: Add PDF papers and documents
```

---
## 8. Slack Bot (Nancy)
The Slack-facing assistant lives outside this submodule (see parent repository). High-level steps:
1. Ensure HTTP API running and reachable (or embed service directly in bot process).
2. Bot receives user message -> constructs query -> calls `/search` and selected `/retrieve` for context.
3. Bot composes answer including source references (doc_id and GitHub URL) before sending back.
4. Optional: adaptively call `/weight` when feedback indicates a source should be boosted or dampened.

Check root-level `nancy_bot.py` or Slack integration docs (`SLACK.md`) for token setup and event subscription details.

---
## 9. Custom GPT (OpenAI Actions / Function Calls)
Define OpenAI tool specs mapping to HTTP endpoints:
- `searchDocuments(query, limit)` -> GET /search
- `retrievePassage(doc_id, start, end)` -> POST /retrieve
- `listTree(prefix, depth)` -> GET /tree
- `setWeight(doc_id, multiplier)` -> POST /weight

Use an API gateway or direct URL. Include auth header. Provide JSON schemas matching request/response models.

---
## 10. Dynamic Weighting Flow
1. Base score from embeddings (dual or single).
2. Extension multiplier (from weights.yaml).
3. Path multiplier(s) (cumulative).
4. Model weight (static config + runtime overrides via `/weight`).
5. Adjusted score = base * extension_weight * model_weight (and any path multipliers folded into extension weight step).

Runtime `/weight` takes effect immediately on subsequent searches.

---
## 11. Updating / Rebuilding
| Action | Command |
|--------|---------|
| Pull repo updates | `nancy-brain build --force-update` or re-run build script |
| Change extension weights | Edit `config/weights.yaml` (no restart needed for runtime? restart or rebuild if cached) |
| Change embedding model | Delete / rename existing `knowledge_base/embeddings` and rebuild with new env vars |

---
## 12. Deployment Notes
- Containerize: build image with pre-built embeddings baked or mount a persistent volume.
- Health probe: `/health` (returns 200 once rag_service initialized) else 503.
- Concurrency: FastAPI async safe; weight updates are simple dict writes (low contention). For heavy load consider a lock if races appear.
- Persistence of runtime weights: currently in-memory; persist manually if needed (extend `set_weight`).

---
## 13. Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| 503 RAG service not initialized | `initialize_rag_service` not called / wrong paths | Call initializer with correct embeddings path |
| Empty search results | Embeddings not built / wrong path | Re-run `nancy-brain build`, verify index directory |
| macOS OpenMP crash | MKL / libomp duplicate | `KMP_DUPLICATE_LIB_OK=TRUE` already set early |
| MCP tools not visible | Wrong path or PYTHONPATH | Use absolute paths in MCP config |
| CLI command not found | Package not installed | `pip install nancy-brain` |

Enable debug logging:
```bash
export LOG_LEVEL=DEBUG
```
(add logic or run with `uvicorn --log-level debug`)

---
## 14. Development & Contributing
```bash
# Clone and set up development environment
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"

# Run tests
pytest

# Run linting
black nancy_brain/ 
flake8 nancy_brain/

# Test CLI locally
nancy-brain --help
```

### Releasing
Nancy Brain uses automated versioning and PyPI publishing:

```bash
# Bump patch version (0.1.0 → 0.1.1)
./release.sh patch

# Bump minor version (0.1.0 → 0.2.0)  
./release.sh minor

# Bump major version (0.1.0 → 1.0.0)
./release.sh major
```

This automatically:
1. Updates version numbers in `pyproject.toml` and `nancy_brain/__init__.py`
2. Creates a git commit and tag
3. Pushes to GitHub, triggering PyPI publication via GitHub Actions

Manual version management:
```bash
# See current version and bump options
bump-my-version show-bump

# Dry run (see what would change)
bump-my-version bump --dry-run patch
```

---
## 15. Roadmap (Optional)
- Persistence layer for runtime weights
- Additional retrieval filters (e.g. semantic rerank)
- Auth plugin / token validation
- VS Code extension
- Package publishing to PyPI

---
## 16. License
See parent repository license.

---
## 17. Minimal Verification Script
```bash
# After build & run
curl -H 'Authorization: Bearer TEST' 'http://localhost:8000/health'
```
Expect JSON with status + trace_id.

---
Happy searching.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nancy-brain",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "AI, RAG, embeddings, github, knowledge-base, research, search",
    "author": null,
    "author_email": "Amber Malpas <malpas.1@osu.edu>",
    "download_url": "https://files.pythonhosted.org/packages/14/c0/d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf/nancy_brain-0.1.3.tar.gz",
    "platform": null,
    "description": "# Nancy Brain\n\n**Turn any GitHub repository into a searchable knowledge base for AI agents.**\n\nLoad the complete source code, documentation, examples, and notebooks from any package you're working with. Nancy Brain gives AI assistants instant access to:\n\n- **Full source code** - actual Python classes, methods, implementation details\n- **Live documentation** - tutorials, API docs, usage examples  \n- **Real examples** - Jupyter notebooks, test cases, configuration files\n- **Smart weighting** - boost important docs, learning persists across sessions\n\nThe AI can now answer questions like \"How do I initialize this class?\" or \"Show me an example of fitting a light curve\" with actual code from the repositories you care about.\n\n## \ud83d\ude80 Quick Start\n\n```bash\n# Install anywhere\npip install nancy-brain\n\n# Initialize a new project\nnancy-brain init my-ai-project\ncd my-ai-project\n\n# Add some repositories  \nnancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git\n\n# Build the knowledge base\nnancy-brain build\n\n# Search it!\nnancy-brain search \"machine learning algorithms\"\n\n# Or launch the web interface\nnancy-brain ui\n```\n\n## \ud83c\udf10 Web Admin Interface\n\nLaunch the visual admin interface for easy knowledge base management:\n\n```bash\nnancy-brain ui\n```\n\nFeatures:\n- **\ud83d\udd0d Live Search** - Test your knowledge base with instant results\n- **\ud83d\udcda Repository Management** - Add/remove GitHub repos with visual forms\n- **\ud83d\udcc4 Article Management** - Add/remove PDF articles with visual forms\n- **\ud83c\udfd7\ufe0f Build Control** - Trigger knowledge base builds with options\n- **\ud83d\udcca System Status** - Check embeddings, configuration, and health\n\nPerfect for non-technical users and rapid prototyping!\n\n## \ud83d\udda5\ufe0f Command Line Interface\n\n```bash\nnancy-brain init <project>        # Initialize new project\nnancy-brain add-repo <url>        # Add GitHub repositories  \nnancy-brain add-article <url> <name>  # Add PDF articles\nnancy-brain build                 # Build knowledge base\nnancy-brain search \"query\"        # Search knowledge base\nnancy-brain serve                 # Start HTTP API server\nnancy-brain ui                    # Launch web admin interface\n```\n\n## Technical Architecture\n\nA lightweight Retrieval-Augmented Generation (RAG) knowledge base with:\n- Embedding + search pipeline (txtai / FAISS based)\n- HTTP API connector (FastAPI)\n- Model Context Protocol (MCP) server connector (tools for search / retrieve / tree / weight)\n- Dynamic weighting system (extension/path weights + runtime doc preferences)\n\nDesigned to power AI assistants on Slack, IDEs, Claude Desktop, custom GPTs, and any MCP-capable client.\n\n---\n## 1. Installation & Quick Setup\n\n### For Users (Recommended)\n```bash\n# Install the package\npip install nancy-brain\n\n# Initialize a new project\nnancy-brain init my-knowledge-base\ncd my-knowledge-base\n\n# Add repositories and build\nnancy-brain add-repo https://github.com/your-org/repo.git\nnancy-brain add-article \"https://arxiv.org/pdf/paper.pdf\" \"paper_name\" --description \"Important paper\"\nnancy-brain build\n\n# Launch web interface\nnancy-brain ui\n```\n\n### For Developers\n```bash\n# Clone and install in development mode\ngit clone <repo-url>\ncd nancy-brain\npip install -e .\"[dev]\"\n\n# Test installation\npytest -q\nnancy-brain --help\n```\n\n---\n## 2. Project Layout (Core Parts)\n```\nnancy_brain/                    # Main Python package\n\u251c\u2500\u2500 cli.py                      # Command line interface\n\u251c\u2500\u2500 admin_ui.py                 # Streamlit web admin interface\n\u2514\u2500\u2500 __init__.py                 # Package initialization\n\nconnectors/http_api/app.py      # FastAPI app\nconnectors/mcp_server/          # MCP server implementation\nrag_core/                       # Core service, search, registry, store, types\nscripts/                        # KB build & management scripts\nconfig/repositories.yml         # Source repository list (input KB)\nconfig/weights.yaml             # Extension + path weighting config\nconfig/model_weights.yaml       # (Optional) static per-doc multipliers\n```\n\n---\n## 3. Configuration\n\n### 3.1 Repositories (`config/repositories.yml`)\nStructure (categories map to lists of repos):\n```yaml\n<category_name>:\n  - name: repoA\n    url: https://github.com/org/repoA.git\n  - name: repoB\n    url: https://github.com/org/repoB.git\n```\nCategories become path prefixes inside the knowledge base (e.g. `cat1/repoA/...`).\n\n### 3.2 Weight Config (`config/weights.yaml`)\n- `extensions`: base multipliers by file extension (.py, .md, etc.)\n- `path_includes`: if substring appears in doc_id, multiplier is applied multiplicatively.\n\n### 3.3 Model Weights (`config/model_weights.yaml`)\nOptional static per-document multipliers (legacy / seed). Runtime updates via `/weight` endpoint or MCP `set_weight` tool override or augment in-memory weights.\n\n### 3.4 Environment Variables\n| Var | Purpose | Default |\n|-----|---------|---------|\n| `USE_DUAL_EMBEDDING` | Enable dual (general + code) embedding scoring | true |\n| `CODE_EMBEDDING_MODEL` | Model name for code index (if dual) | microsoft/codebert-base |\n| `KMP_DUPLICATE_LIB_OK` | Set to TRUE to avoid OpenMP macOS clash | TRUE |\n\n---\n## 4. Building the Knowledge Base\nEmbeddings must be built before meaningful search.\n\n### Using the CLI (Recommended)\n```bash\n# Basic build (repositories only)\nnancy-brain build\n\n# Build with PDF articles (if configured)\nnancy-brain build --articles-config config/articles.yml\n\n# Force update all repositories\nnancy-brain build --force-update\n\n# Or use the web interface\nnancy-brain ui  # Go to \"Build Knowledge Base\" page\n```\n\n### Using the Python Script Directly\n```bash\nconda activate nancy-brain\ncd src/nancy-brain\n# Basic build (repositories only)\npython scripts/build_knowledge_base.py \\\n  --config config/repositories.yml \\\n  --embeddings-path knowledge_base/embeddings\n\n# Full build including optional PDF articles (if config/articles.yml exists)\npython scripts/build_knowledge_base.py \\\n  --config config/repositories.yml \\\n  --articles-config config/articles.yml \\\n  --base-path knowledge_base/raw \\\n  --embeddings-path knowledge_base/embeddings \\\n  --force-update \\\n  --dirty\n# You can run without the dirty tag to automatically \n# remove source material after indexing is complete\n```\nRun `python scripts/build_knowledge_base.py -h` for all options.\n\n### 4.1 PDF Articles (Optional Quick Setup)\n1. Create `config/articles.yml` (example):\n```yaml\njournal_articles:\n  - name: Paczynski_1986_ApJ_304_1\n    url: https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF\n    description: Paczynski (1986) \u2013 Gravitational microlensing\n```\n2. Install Java (for Tika PDF extraction) \u2013 macOS:\n```bash\nbrew install openjdk\nexport JAVA_HOME=\"/opt/homebrew/opt/openjdk\"\nexport PATH=\"$JAVA_HOME/bin:$PATH\"\n```\n3. (Optional fallback only) Install lightweight PDF libs if you skip Java:\n```bash\npip install PyPDF2 pdfplumber\n```\n4. Build with articles (explicit):\n```bash\npython scripts/build_knowledge_base.py --config config/repositories.yml --articles-config config/articles.yml\n```\n5. Keep raw PDFs for inspection: add `--dirty`.\n\nNotes:\n- If Java/Tika not available, script attempts fallback extraction (needs PyPDF2/pdfplumber or fitz).\n- Cleanups remove raw PDFs unless `--dirty` supplied.\n- Article docs are indexed under `journal_articles/<category>/<name>`.\n\nKey flags:\n- `--config` path to repositories YAML (was --repositories in older docs)\n- `--articles-config` optional PDF articles YAML\n- `--base-path` where raw repos/PDFs live (default knowledge_base/raw)\n- `--embeddings-path` output index directory\n- `--force-update` re-pull repos / re-download PDFs\n- `--category <name>` limit to one category\n- `--dry-run` show actions without performing\n- `--dirty` keep raw sources (skip cleanup)\n\nThis will:\n1. Clone / update listed repos under `knowledge_base/raw/<category>/<repo>`\n2. (Optionally) download PDFs into category directories\n3. Convert notebooks (*.ipynb -> *.nb.txt) if nb4llm available\n4. Extract and normalize text + (optionally) PDF text\n5. Build / update embeddings index at `knowledge_base/embeddings` (and `code_index` if dual embeddings enabled)\n\nRe-run when repositories or articles change.\n\n---\n## 5. Running Services\n\n### Web Admin Interface (Recommended for Getting Started)\n```bash\nnancy-brain ui\n# Opens Streamlit interface at http://localhost:8501\n# Features: search, repo management, build control, status\n```\n\n### HTTP API Server\n```bash\n# Using CLI\nnancy-brain serve\n\n# Or directly with uvicorn\nuvicorn connectors.http_api.app:app --host 0.0.0.0 --port 8000\n```\n\n### MCP Server (for AI Assistants)\n```bash\n# Run MCP stdio server\npython run_mcp_server.py\n```\n\nInitialize service programmatically (example pattern):\n```python\nfrom pathlib import Path\nfrom connectors.http_api.app import initialize_rag_service\ninitialize_rag_service(\n    config_path=Path('config/repositories.yml'),\n    embeddings_path=Path('knowledge_base/embeddings'),\n    weights_path=Path('config/weights.yaml'),\n    use_dual_embedding=True\n)\n```\nThe FastAPI dependency layer will then serve requests.\n\n### Command Line Search\n```bash\n# Quick search from command line\nnancy-brain search \"machine learning algorithms\" --limit 5\n\n# Search with custom paths\nnancy-brain search \"neural networks\" \\\n  --embeddings-path custom/embeddings \\\n  --config custom/repositories.yml\n```\n\n### 5.1 Endpoints (Bearer auth placeholder)\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | `/health` | Service status |\n| GET | `/version` | Index / build meta |\n| GET | `/search?query=...&limit=N` | Search documents |\n| POST | `/retrieve` | Retrieve passage (doc_id + line range) |\n| POST | `/retrieve/batch` | Batch retrieve |\n| GET | `/tree?prefix=...` | List KB tree |\n| POST | `/weight` | Set runtime doc weight |\n\nExample:\n```bash\ncurl -H \"Authorization: Bearer TEST\" 'http://localhost:8000/search?query=light%20curve&limit=5'\n```\n\nSet a document weight (boost factor 0.5\u20132.0 typical):\n```bash\ncurl -X POST -H 'Authorization: Bearer TEST' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"doc_id\":\"cat1/repoA/path/file.py\",\"multiplier\":2.0}' \\\n  http://localhost:8000/weight\n```\n\n---\n## 6. MCP Server\nRun the MCP stdio server:\n```bash\npython run_mcp_server.py\n```\nTools exposed (operation names):\n- `search` (query, limit)\n- `retrieve` (doc_id, start, end)\n- `retrieve_batch`\n- `tree` (prefix, depth)\n- `set_weight` (doc_id, multiplier)\n- `status` / `version`\n\n### 6.1 VS Code Integration\n1. Install a Model Context Protocol client extension (e.g. \"MCP Explorer\" or equivalent).\n2. Add a server entry pointing to the script, stdio transport. Example config snippet:\n```\n{\n  \"mcpServers\": {\n    \"nancy-brain\": {\n      \"command\": \"python\",\n      \"args\": [\"/absolute/path/to/src/nancy-brain/run_mcp_server.py\"],\n      \"env\": {\n        \"PYTHONPATH\": \"/absolute/path/to/src/nancy-brain\" \n      }\n    }\n  }\n}\n```\n\n*Specific mamba environment example:*\n\n```\n{\n\t\"servers\": {\n\t\t\"nancy-brain\": {\n\t\t\t\"type\": \"stdio\",\n\t\t\t\"command\": \"/Users/malpas.1/.local/share/mamba/envs/nancy-brain/bin/python\",\n\t\t\t\"args\": [\n\t\t\t\t\"/Users/malpas.1/Code/slack-bot/src/nancy-brain/run_mcp_server.py\"\n\t\t\t],\n\t\t\t\"env\": {\n\t\t\t\t\"PYTHONPATH\": \"/Users/malpas.1/Code/slack-bot/src/nancy-brain\",\n\t\t\t\t\"KMP_DUPLICATE_LIB_OK\": \"TRUE\"\n\t\t\t}\n\t\t}\n\t},\n\t\"inputs\": []\n}\n```\n\n3. Reload VS Code. The provider should list the tools; invoke `search` to test.\n\n### 6.2 Claude Desktop\nClaude supports MCP config in its settings file. Add an entry similar to above (command + args). Restart Claude Desktop; tools appear in the prompt tools menu.\n\n---\n## 7. Use Cases & Examples\n\n### For Researchers\n```bash\n# Add astronomy packages\nnancy-brain add-repo https://github.com/astropy/astropy.git\nnancy-brain add-repo https://github.com/rpoleski/MulensModel.git\n\n# Add key research papers\nnancy-brain add-article \\\n  \"https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF\" \\\n  \"Paczynski_1986_microlensing\" \\\n  --category \"foundational_papers\" \\\n  --description \"Paczynski (1986) - Gravitational microlensing by the galactic halo\"\n\nnancy-brain build\n\n# AI can now answer: \"How do I model a microlensing event?\"\nnancy-brain search \"microlensing model fit\"\n```\n\n### For ML Engineers  \n```bash\n# Add ML frameworks\nnancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git\nnancy-brain add-repo https://github.com/pytorch/pytorch.git\nnancy-brain build\n\n# AI can now answer: \"Show me gradient descent implementation\"\nnancy-brain search \"gradient descent optimizer\"\n```\n\n### For Teams\n```bash\n# Launch web interface for non-technical users\nnancy-brain ui\n# Point team to http://localhost:8501\n# They can search, add repos, manage articles, trigger builds visually\n# Repository Management tab: Add GitHub repos\n# Articles tab: Add PDF papers and documents\n```\n\n---\n## 8. Slack Bot (Nancy)\nThe Slack-facing assistant lives outside this submodule (see parent repository). High-level steps:\n1. Ensure HTTP API running and reachable (or embed service directly in bot process).\n2. Bot receives user message -> constructs query -> calls `/search` and selected `/retrieve` for context.\n3. Bot composes answer including source references (doc_id and GitHub URL) before sending back.\n4. Optional: adaptively call `/weight` when feedback indicates a source should be boosted or dampened.\n\nCheck root-level `nancy_bot.py` or Slack integration docs (`SLACK.md`) for token setup and event subscription details.\n\n---\n## 9. Custom GPT (OpenAI Actions / Function Calls)\nDefine OpenAI tool specs mapping to HTTP endpoints:\n- `searchDocuments(query, limit)` -> GET /search\n- `retrievePassage(doc_id, start, end)` -> POST /retrieve\n- `listTree(prefix, depth)` -> GET /tree\n- `setWeight(doc_id, multiplier)` -> POST /weight\n\nUse an API gateway or direct URL. Include auth header. Provide JSON schemas matching request/response models.\n\n---\n## 10. Dynamic Weighting Flow\n1. Base score from embeddings (dual or single).\n2. Extension multiplier (from weights.yaml).\n3. Path multiplier(s) (cumulative).\n4. Model weight (static config + runtime overrides via `/weight`).\n5. Adjusted score = base * extension_weight * model_weight (and any path multipliers folded into extension weight step).\n\nRuntime `/weight` takes effect immediately on subsequent searches.\n\n---\n## 11. Updating / Rebuilding\n| Action | Command |\n|--------|---------|\n| Pull repo updates | `nancy-brain build --force-update` or re-run build script |\n| Change extension weights | Edit `config/weights.yaml` (no restart needed for runtime? restart or rebuild if cached) |\n| Change embedding model | Delete / rename existing `knowledge_base/embeddings` and rebuild with new env vars |\n\n---\n## 12. Deployment Notes\n- Containerize: build image with pre-built embeddings baked or mount a persistent volume.\n- Health probe: `/health` (returns 200 once rag_service initialized) else 503.\n- Concurrency: FastAPI async safe; weight updates are simple dict writes (low contention). For heavy load consider a lock if races appear.\n- Persistence of runtime weights: currently in-memory; persist manually if needed (extend `set_weight`).\n\n---\n## 13. Troubleshooting\n| Symptom | Cause | Fix |\n|---------|-------|-----|\n| 503 RAG service not initialized | `initialize_rag_service` not called / wrong paths | Call initializer with correct embeddings path |\n| Empty search results | Embeddings not built / wrong path | Re-run `nancy-brain build`, verify index directory |\n| macOS OpenMP crash | MKL / libomp duplicate | `KMP_DUPLICATE_LIB_OK=TRUE` already set early |\n| MCP tools not visible | Wrong path or PYTHONPATH | Use absolute paths in MCP config |\n| CLI command not found | Package not installed | `pip install nancy-brain` |\n\nEnable debug logging:\n```bash\nexport LOG_LEVEL=DEBUG\n```\n(add logic or run with `uvicorn --log-level debug`)\n\n---\n## 14. Development & Contributing\n```bash\n# Clone and set up development environment\ngit clone <repo-url>\ncd nancy-brain\npip install -e .\"[dev]\"\n\n# Run tests\npytest\n\n# Run linting\nblack nancy_brain/ \nflake8 nancy_brain/\n\n# Test CLI locally\nnancy-brain --help\n```\n\n### Releasing\nNancy Brain uses automated versioning and PyPI publishing:\n\n```bash\n# Bump patch version (0.1.0 \u2192 0.1.1)\n./release.sh patch\n\n# Bump minor version (0.1.0 \u2192 0.2.0)  \n./release.sh minor\n\n# Bump major version (0.1.0 \u2192 1.0.0)\n./release.sh major\n```\n\nThis automatically:\n1. Updates version numbers in `pyproject.toml` and `nancy_brain/__init__.py`\n2. Creates a git commit and tag\n3. Pushes to GitHub, triggering PyPI publication via GitHub Actions\n\nManual version management:\n```bash\n# See current version and bump options\nbump-my-version show-bump\n\n# Dry run (see what would change)\nbump-my-version bump --dry-run patch\n```\n\n---\n## 15. Roadmap (Optional)\n- Persistence layer for runtime weights\n- Additional retrieval filters (e.g. semantic rerank)\n- Auth plugin / token validation\n- VS Code extension\n- Package publishing to PyPI\n\n---\n## 16. License\nSee parent repository license.\n\n---\n## 17. Minimal Verification Script\n```bash\n# After build & run\ncurl -H 'Authorization: Bearer TEST' 'http://localhost:8000/health'\n```\nExpect JSON with status + trace_id.\n\n---\nHappy searching.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Turn any GitHub repository into a searchable knowledge base for AI agents",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/AmberLee2427/nancy-brain/blob/main/README.md",
        "Homepage": "https://github.com/AmberLee2427/nancy-brain",
        "Issues": "https://github.com/AmberLee2427/nancy-brain/issues",
        "Repository": "https://github.com/AmberLee2427/nancy-brain"
    },
    "split_keywords": [
        "ai",
        " rag",
        " embeddings",
        " github",
        " knowledge-base",
        " research",
        " search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8b35462f08b5c7be1d092136b0e03e1fb4d386f6ee87986656cfd6191481424f",
                "md5": "e2d915977f8daacf581f535a5ffa2758",
                "sha256": "d0220be525a4902106dc59013af4aa67cfde7346dde8b560f7498078a4b2c23a"
            },
            "downloads": -1,
            "filename": "nancy_brain-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e2d915977f8daacf581f535a5ffa2758",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 52297,
            "upload_time": "2025-08-24T20:37:05",
            "upload_time_iso_8601": "2025-08-24T20:37:05.628714Z",
            "url": "https://files.pythonhosted.org/packages/8b/35/462f08b5c7be1d092136b0e03e1fb4d386f6ee87986656cfd6191481424f/nancy_brain-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "14c0d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf",
                "md5": "bb94010a94e65c9161dc980e193ee712",
                "sha256": "4a612e49fba7f0ca4dc85837ca59a342521fbb170e6fafaa6da89cc885c69f8d"
            },
            "downloads": -1,
            "filename": "nancy_brain-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "bb94010a94e65c9161dc980e193ee712",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 1996881,
            "upload_time": "2025-08-24T20:37:07",
            "upload_time_iso_8601": "2025-08-24T20:37:07.607776Z",
            "url": "https://files.pythonhosted.org/packages/14/c0/d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf/nancy_brain-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-24 20:37:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AmberLee2427",
    "github_project": "nancy-brain",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "nancy-brain"
}

None