# Nancy Brain
**Turn any GitHub repository into a searchable knowledge base for AI agents.**
Load the complete source code, documentation, examples, and notebooks from any package you're working with. Nancy Brain gives AI assistants instant access to:
- **Full source code** - actual Python classes, methods, implementation details
- **Live documentation** - tutorials, API docs, usage examples
- **Real examples** - Jupyter notebooks, test cases, configuration files
- **Smart weighting** - boost important docs, learning persists across sessions
The AI can now answer questions like "How do I initialize this class?" or "Show me an example of fitting a light curve" with actual code from the repositories you care about.
## 🚀 Quick Start
```bash
# Install anywhere
pip install nancy-brain
# Initialize a new project
nancy-brain init my-ai-project
cd my-ai-project
# Add some repositories
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git
# Build the knowledge base
nancy-brain build
# Search it!
nancy-brain search "machine learning algorithms"
# Or launch the web interface
nancy-brain ui
```
## 🌐 Web Admin Interface
Launch the visual admin interface for easy knowledge base management:
```bash
nancy-brain ui
```
Features:
- **🔍 Live Search** - Test your knowledge base with instant results
- **📚 Repository Management** - Add/remove GitHub repos with visual forms
- **📄 Article Management** - Add/remove PDF articles with visual forms
- **🏗️ Build Control** - Trigger knowledge base builds with options
- **📊 System Status** - Check embeddings, configuration, and health
Perfect for non-technical users and rapid prototyping!
## 🖥️ Command Line Interface
```bash
nancy-brain init <project> # Initialize new project
nancy-brain add-repo <url> # Add GitHub repositories
nancy-brain add-article <url> <name> # Add PDF articles
nancy-brain build # Build knowledge base
nancy-brain search "query" # Search knowledge base
nancy-brain serve # Start HTTP API server
nancy-brain ui # Launch web admin interface
```
## Technical Architecture
A lightweight Retrieval-Augmented Generation (RAG) knowledge base with:
- Embedding + search pipeline (txtai / FAISS based)
- HTTP API connector (FastAPI)
- Model Context Protocol (MCP) server connector (tools for search / retrieve / tree / weight)
- Dynamic weighting system (extension/path weights + runtime doc preferences)
Designed to power AI assistants on Slack, IDEs, Claude Desktop, custom GPTs, and any MCP-capable client.
---
## 1. Installation & Quick Setup
### For Users (Recommended)
```bash
# Install the package
pip install nancy-brain
# Initialize a new project
nancy-brain init my-knowledge-base
cd my-knowledge-base
# Add repositories and build
nancy-brain add-repo https://github.com/your-org/repo.git
nancy-brain add-article "https://arxiv.org/pdf/paper.pdf" "paper_name" --description "Important paper"
nancy-brain build
# Launch web interface
nancy-brain ui
```
### For Developers
```bash
# Clone and install in development mode
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"
# Test installation
pytest -q
nancy-brain --help
```
---
## 2. Project Layout (Core Parts)
```
nancy_brain/ # Main Python package
├── cli.py # Command line interface
├── admin_ui.py # Streamlit web admin interface
└── __init__.py # Package initialization
connectors/http_api/app.py # FastAPI app
connectors/mcp_server/ # MCP server implementation
rag_core/ # Core service, search, registry, store, types
scripts/ # KB build & management scripts
config/repositories.yml # Source repository list (input KB)
config/weights.yaml # Extension + path weighting config
config/model_weights.yaml # (Optional) static per-doc multipliers
```
---
## 3. Configuration
### 3.1 Repositories (`config/repositories.yml`)
Structure (categories map to lists of repos):
```yaml
<category_name>:
- name: repoA
url: https://github.com/org/repoA.git
- name: repoB
url: https://github.com/org/repoB.git
```
Categories become path prefixes inside the knowledge base (e.g. `cat1/repoA/...`).
### 3.2 Weight Config (`config/weights.yaml`)
- `extensions`: base multipliers by file extension (.py, .md, etc.)
- `path_includes`: if substring appears in doc_id, multiplier is applied multiplicatively.
### 3.3 Model Weights (`config/model_weights.yaml`)
Optional static per-document multipliers (legacy / seed). Runtime updates via `/weight` endpoint or MCP `set_weight` tool override or augment in-memory weights.
### 3.4 Environment Variables
| Var | Purpose | Default |
|-----|---------|---------|
| `USE_DUAL_EMBEDDING` | Enable dual (general + code) embedding scoring | true |
| `CODE_EMBEDDING_MODEL` | Model name for code index (if dual) | microsoft/codebert-base |
| `KMP_DUPLICATE_LIB_OK` | Set to TRUE to avoid OpenMP macOS clash | TRUE |
---
## 4. Building the Knowledge Base
Embeddings must be built before meaningful search.
### Using the CLI (Recommended)
```bash
# Basic build (repositories only)
nancy-brain build
# Build with PDF articles (if configured)
nancy-brain build --articles-config config/articles.yml
# Force update all repositories
nancy-brain build --force-update
# Or use the web interface
nancy-brain ui # Go to "Build Knowledge Base" page
```
### Using the Python Script Directly
```bash
conda activate nancy-brain
cd src/nancy-brain
# Basic build (repositories only)
python scripts/build_knowledge_base.py \
--config config/repositories.yml \
--embeddings-path knowledge_base/embeddings
# Full build including optional PDF articles (if config/articles.yml exists)
python scripts/build_knowledge_base.py \
--config config/repositories.yml \
--articles-config config/articles.yml \
--base-path knowledge_base/raw \
--embeddings-path knowledge_base/embeddings \
--force-update \
--dirty
# You can run without the dirty tag to automatically
# remove source material after indexing is complete
```
Run `python scripts/build_knowledge_base.py -h` for all options.
### 4.1 PDF Articles (Optional Quick Setup)
1. Create `config/articles.yml` (example):
```yaml
journal_articles:
- name: Paczynski_1986_ApJ_304_1
url: https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF
description: Paczynski (1986) – Gravitational microlensing
```
2. Install Java (for Tika PDF extraction) – macOS:
```bash
brew install openjdk
export JAVA_HOME="/opt/homebrew/opt/openjdk"
export PATH="$JAVA_HOME/bin:$PATH"
```
3. (Optional fallback only) Install lightweight PDF libs if you skip Java:
```bash
pip install PyPDF2 pdfplumber
```
4. Build with articles (explicit):
```bash
python scripts/build_knowledge_base.py --config config/repositories.yml --articles-config config/articles.yml
```
5. Keep raw PDFs for inspection: add `--dirty`.
Notes:
- If Java/Tika not available, script attempts fallback extraction (needs PyPDF2/pdfplumber or fitz).
- Cleanups remove raw PDFs unless `--dirty` supplied.
- Article docs are indexed under `journal_articles/<category>/<name>`.
Key flags:
- `--config` path to repositories YAML (was --repositories in older docs)
- `--articles-config` optional PDF articles YAML
- `--base-path` where raw repos/PDFs live (default knowledge_base/raw)
- `--embeddings-path` output index directory
- `--force-update` re-pull repos / re-download PDFs
- `--category <name>` limit to one category
- `--dry-run` show actions without performing
- `--dirty` keep raw sources (skip cleanup)
This will:
1. Clone / update listed repos under `knowledge_base/raw/<category>/<repo>`
2. (Optionally) download PDFs into category directories
3. Convert notebooks (*.ipynb -> *.nb.txt) if nb4llm available
4. Extract and normalize text + (optionally) PDF text
5. Build / update embeddings index at `knowledge_base/embeddings` (and `code_index` if dual embeddings enabled)
Re-run when repositories or articles change.
---
## 5. Running Services
### Web Admin Interface (Recommended for Getting Started)
```bash
nancy-brain ui
# Opens Streamlit interface at http://localhost:8501
# Features: search, repo management, build control, status
```
### HTTP API Server
```bash
# Using CLI
nancy-brain serve
# Or directly with uvicorn
uvicorn connectors.http_api.app:app --host 0.0.0.0 --port 8000
```
### MCP Server (for AI Assistants)
```bash
# Run MCP stdio server
python run_mcp_server.py
```
Initialize service programmatically (example pattern):
```python
from pathlib import Path
from connectors.http_api.app import initialize_rag_service
initialize_rag_service(
config_path=Path('config/repositories.yml'),
embeddings_path=Path('knowledge_base/embeddings'),
weights_path=Path('config/weights.yaml'),
use_dual_embedding=True
)
```
The FastAPI dependency layer will then serve requests.
### Command Line Search
```bash
# Quick search from command line
nancy-brain search "machine learning algorithms" --limit 5
# Search with custom paths
nancy-brain search "neural networks" \
--embeddings-path custom/embeddings \
--config custom/repositories.yml
```
### 5.1 Endpoints (Bearer auth placeholder)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Service status |
| GET | `/version` | Index / build meta |
| GET | `/search?query=...&limit=N` | Search documents |
| POST | `/retrieve` | Retrieve passage (doc_id + line range) |
| POST | `/retrieve/batch` | Batch retrieve |
| GET | `/tree?prefix=...` | List KB tree |
| POST | `/weight` | Set runtime doc weight |
Example:
```bash
curl -H "Authorization: Bearer TEST" 'http://localhost:8000/search?query=light%20curve&limit=5'
```
Set a document weight (boost factor 0.5–2.0 typical):
```bash
curl -X POST -H 'Authorization: Bearer TEST' \
-H 'Content-Type: application/json' \
-d '{"doc_id":"cat1/repoA/path/file.py","multiplier":2.0}' \
http://localhost:8000/weight
```
---
## 6. MCP Server
Run the MCP stdio server:
```bash
python run_mcp_server.py
```
Tools exposed (operation names):
- `search` (query, limit)
- `retrieve` (doc_id, start, end)
- `retrieve_batch`
- `tree` (prefix, depth)
- `set_weight` (doc_id, multiplier)
- `status` / `version`
### 6.1 VS Code Integration
1. Install a Model Context Protocol client extension (e.g. "MCP Explorer" or equivalent).
2. Add a server entry pointing to the script, stdio transport. Example config snippet:
```
{
"mcpServers": {
"nancy-brain": {
"command": "python",
"args": ["/absolute/path/to/src/nancy-brain/run_mcp_server.py"],
"env": {
"PYTHONPATH": "/absolute/path/to/src/nancy-brain"
}
}
}
}
```
*Specific mamba environment example:*
```
{
"servers": {
"nancy-brain": {
"type": "stdio",
"command": "/Users/malpas.1/.local/share/mamba/envs/nancy-brain/bin/python",
"args": [
"/Users/malpas.1/Code/slack-bot/src/nancy-brain/run_mcp_server.py"
],
"env": {
"PYTHONPATH": "/Users/malpas.1/Code/slack-bot/src/nancy-brain",
"KMP_DUPLICATE_LIB_OK": "TRUE"
}
}
},
"inputs": []
}
```
3. Reload VS Code. The provider should list the tools; invoke `search` to test.
### 6.2 Claude Desktop
Claude supports MCP config in its settings file. Add an entry similar to above (command + args). Restart Claude Desktop; tools appear in the prompt tools menu.
---
## 7. Use Cases & Examples
### For Researchers
```bash
# Add astronomy packages
nancy-brain add-repo https://github.com/astropy/astropy.git
nancy-brain add-repo https://github.com/rpoleski/MulensModel.git
# Add key research papers
nancy-brain add-article \
"https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF" \
"Paczynski_1986_microlensing" \
--category "foundational_papers" \
--description "Paczynski (1986) - Gravitational microlensing by the galactic halo"
nancy-brain build
# AI can now answer: "How do I model a microlensing event?"
nancy-brain search "microlensing model fit"
```
### For ML Engineers
```bash
# Add ML frameworks
nancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git
nancy-brain add-repo https://github.com/pytorch/pytorch.git
nancy-brain build
# AI can now answer: "Show me gradient descent implementation"
nancy-brain search "gradient descent optimizer"
```
### For Teams
```bash
# Launch web interface for non-technical users
nancy-brain ui
# Point team to http://localhost:8501
# They can search, add repos, manage articles, trigger builds visually
# Repository Management tab: Add GitHub repos
# Articles tab: Add PDF papers and documents
```
---
## 8. Slack Bot (Nancy)
The Slack-facing assistant lives outside this submodule (see parent repository). High-level steps:
1. Ensure HTTP API running and reachable (or embed service directly in bot process).
2. Bot receives user message -> constructs query -> calls `/search` and selected `/retrieve` for context.
3. Bot composes answer including source references (doc_id and GitHub URL) before sending back.
4. Optional: adaptively call `/weight` when feedback indicates a source should be boosted or dampened.
Check root-level `nancy_bot.py` or Slack integration docs (`SLACK.md`) for token setup and event subscription details.
---
## 9. Custom GPT (OpenAI Actions / Function Calls)
Define OpenAI tool specs mapping to HTTP endpoints:
- `searchDocuments(query, limit)` -> GET /search
- `retrievePassage(doc_id, start, end)` -> POST /retrieve
- `listTree(prefix, depth)` -> GET /tree
- `setWeight(doc_id, multiplier)` -> POST /weight
Use an API gateway or direct URL. Include auth header. Provide JSON schemas matching request/response models.
---
## 10. Dynamic Weighting Flow
1. Base score from embeddings (dual or single).
2. Extension multiplier (from weights.yaml).
3. Path multiplier(s) (cumulative).
4. Model weight (static config + runtime overrides via `/weight`).
5. Adjusted score = base * extension_weight * model_weight (and any path multipliers folded into extension weight step).
Runtime `/weight` takes effect immediately on subsequent searches.
---
## 11. Updating / Rebuilding
| Action | Command |
|--------|---------|
| Pull repo updates | `nancy-brain build --force-update` or re-run build script |
| Change extension weights | Edit `config/weights.yaml` (no restart needed for runtime? restart or rebuild if cached) |
| Change embedding model | Delete / rename existing `knowledge_base/embeddings` and rebuild with new env vars |
---
## 12. Deployment Notes
- Containerize: build image with pre-built embeddings baked or mount a persistent volume.
- Health probe: `/health` (returns 200 once rag_service initialized) else 503.
- Concurrency: FastAPI async safe; weight updates are simple dict writes (low contention). For heavy load consider a lock if races appear.
- Persistence of runtime weights: currently in-memory; persist manually if needed (extend `set_weight`).
---
## 13. Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| 503 RAG service not initialized | `initialize_rag_service` not called / wrong paths | Call initializer with correct embeddings path |
| Empty search results | Embeddings not built / wrong path | Re-run `nancy-brain build`, verify index directory |
| macOS OpenMP crash | MKL / libomp duplicate | `KMP_DUPLICATE_LIB_OK=TRUE` already set early |
| MCP tools not visible | Wrong path or PYTHONPATH | Use absolute paths in MCP config |
| CLI command not found | Package not installed | `pip install nancy-brain` |
Enable debug logging:
```bash
export LOG_LEVEL=DEBUG
```
(add logic or run with `uvicorn --log-level debug`)
---
## 14. Development & Contributing
```bash
# Clone and set up development environment
git clone <repo-url>
cd nancy-brain
pip install -e ."[dev]"
# Run tests
pytest
# Run linting
black nancy_brain/
flake8 nancy_brain/
# Test CLI locally
nancy-brain --help
```
### Releasing
Nancy Brain uses automated versioning and PyPI publishing:
```bash
# Bump patch version (0.1.0 → 0.1.1)
./release.sh patch
# Bump minor version (0.1.0 → 0.2.0)
./release.sh minor
# Bump major version (0.1.0 → 1.0.0)
./release.sh major
```
This automatically:
1. Updates version numbers in `pyproject.toml` and `nancy_brain/__init__.py`
2. Creates a git commit and tag
3. Pushes to GitHub, triggering PyPI publication via GitHub Actions
Manual version management:
```bash
# See current version and bump options
bump-my-version show-bump
# Dry run (see what would change)
bump-my-version bump --dry-run patch
```
---
## 15. Roadmap (Optional)
- Persistence layer for runtime weights
- Additional retrieval filters (e.g. semantic rerank)
- Auth plugin / token validation
- VS Code extension
- Package publishing to PyPI
---
## 16. License
See parent repository license.
---
## 17. Minimal Verification Script
```bash
# After build & run
curl -H 'Authorization: Bearer TEST' 'http://localhost:8000/health'
```
Expect JSON with status + trace_id.
---
Happy searching.
Raw data
{
"_id": null,
"home_page": null,
"name": "nancy-brain",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "AI, RAG, embeddings, github, knowledge-base, research, search",
"author": null,
"author_email": "Amber Malpas <malpas.1@osu.edu>",
"download_url": "https://files.pythonhosted.org/packages/14/c0/d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf/nancy_brain-0.1.3.tar.gz",
"platform": null,
"description": "# Nancy Brain\n\n**Turn any GitHub repository into a searchable knowledge base for AI agents.**\n\nLoad the complete source code, documentation, examples, and notebooks from any package you're working with. Nancy Brain gives AI assistants instant access to:\n\n- **Full source code** - actual Python classes, methods, implementation details\n- **Live documentation** - tutorials, API docs, usage examples \n- **Real examples** - Jupyter notebooks, test cases, configuration files\n- **Smart weighting** - boost important docs, learning persists across sessions\n\nThe AI can now answer questions like \"How do I initialize this class?\" or \"Show me an example of fitting a light curve\" with actual code from the repositories you care about.\n\n## \ud83d\ude80 Quick Start\n\n```bash\n# Install anywhere\npip install nancy-brain\n\n# Initialize a new project\nnancy-brain init my-ai-project\ncd my-ai-project\n\n# Add some repositories \nnancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git\n\n# Build the knowledge base\nnancy-brain build\n\n# Search it!\nnancy-brain search \"machine learning algorithms\"\n\n# Or launch the web interface\nnancy-brain ui\n```\n\n## \ud83c\udf10 Web Admin Interface\n\nLaunch the visual admin interface for easy knowledge base management:\n\n```bash\nnancy-brain ui\n```\n\nFeatures:\n- **\ud83d\udd0d Live Search** - Test your knowledge base with instant results\n- **\ud83d\udcda Repository Management** - Add/remove GitHub repos with visual forms\n- **\ud83d\udcc4 Article Management** - Add/remove PDF articles with visual forms\n- **\ud83c\udfd7\ufe0f Build Control** - Trigger knowledge base builds with options\n- **\ud83d\udcca System Status** - Check embeddings, configuration, and health\n\nPerfect for non-technical users and rapid prototyping!\n\n## \ud83d\udda5\ufe0f Command Line Interface\n\n```bash\nnancy-brain init <project> # Initialize new project\nnancy-brain add-repo <url> # Add GitHub repositories \nnancy-brain add-article <url> <name> # Add PDF articles\nnancy-brain build # Build knowledge base\nnancy-brain search \"query\" # Search knowledge base\nnancy-brain serve # Start HTTP API server\nnancy-brain ui # Launch web admin interface\n```\n\n## Technical Architecture\n\nA lightweight Retrieval-Augmented Generation (RAG) knowledge base with:\n- Embedding + search pipeline (txtai / FAISS based)\n- HTTP API connector (FastAPI)\n- Model Context Protocol (MCP) server connector (tools for search / retrieve / tree / weight)\n- Dynamic weighting system (extension/path weights + runtime doc preferences)\n\nDesigned to power AI assistants on Slack, IDEs, Claude Desktop, custom GPTs, and any MCP-capable client.\n\n---\n## 1. Installation & Quick Setup\n\n### For Users (Recommended)\n```bash\n# Install the package\npip install nancy-brain\n\n# Initialize a new project\nnancy-brain init my-knowledge-base\ncd my-knowledge-base\n\n# Add repositories and build\nnancy-brain add-repo https://github.com/your-org/repo.git\nnancy-brain add-article \"https://arxiv.org/pdf/paper.pdf\" \"paper_name\" --description \"Important paper\"\nnancy-brain build\n\n# Launch web interface\nnancy-brain ui\n```\n\n### For Developers\n```bash\n# Clone and install in development mode\ngit clone <repo-url>\ncd nancy-brain\npip install -e .\"[dev]\"\n\n# Test installation\npytest -q\nnancy-brain --help\n```\n\n---\n## 2. Project Layout (Core Parts)\n```\nnancy_brain/ # Main Python package\n\u251c\u2500\u2500 cli.py # Command line interface\n\u251c\u2500\u2500 admin_ui.py # Streamlit web admin interface\n\u2514\u2500\u2500 __init__.py # Package initialization\n\nconnectors/http_api/app.py # FastAPI app\nconnectors/mcp_server/ # MCP server implementation\nrag_core/ # Core service, search, registry, store, types\nscripts/ # KB build & management scripts\nconfig/repositories.yml # Source repository list (input KB)\nconfig/weights.yaml # Extension + path weighting config\nconfig/model_weights.yaml # (Optional) static per-doc multipliers\n```\n\n---\n## 3. Configuration\n\n### 3.1 Repositories (`config/repositories.yml`)\nStructure (categories map to lists of repos):\n```yaml\n<category_name>:\n - name: repoA\n url: https://github.com/org/repoA.git\n - name: repoB\n url: https://github.com/org/repoB.git\n```\nCategories become path prefixes inside the knowledge base (e.g. `cat1/repoA/...`).\n\n### 3.2 Weight Config (`config/weights.yaml`)\n- `extensions`: base multipliers by file extension (.py, .md, etc.)\n- `path_includes`: if substring appears in doc_id, multiplier is applied multiplicatively.\n\n### 3.3 Model Weights (`config/model_weights.yaml`)\nOptional static per-document multipliers (legacy / seed). Runtime updates via `/weight` endpoint or MCP `set_weight` tool override or augment in-memory weights.\n\n### 3.4 Environment Variables\n| Var | Purpose | Default |\n|-----|---------|---------|\n| `USE_DUAL_EMBEDDING` | Enable dual (general + code) embedding scoring | true |\n| `CODE_EMBEDDING_MODEL` | Model name for code index (if dual) | microsoft/codebert-base |\n| `KMP_DUPLICATE_LIB_OK` | Set to TRUE to avoid OpenMP macOS clash | TRUE |\n\n---\n## 4. Building the Knowledge Base\nEmbeddings must be built before meaningful search.\n\n### Using the CLI (Recommended)\n```bash\n# Basic build (repositories only)\nnancy-brain build\n\n# Build with PDF articles (if configured)\nnancy-brain build --articles-config config/articles.yml\n\n# Force update all repositories\nnancy-brain build --force-update\n\n# Or use the web interface\nnancy-brain ui # Go to \"Build Knowledge Base\" page\n```\n\n### Using the Python Script Directly\n```bash\nconda activate nancy-brain\ncd src/nancy-brain\n# Basic build (repositories only)\npython scripts/build_knowledge_base.py \\\n --config config/repositories.yml \\\n --embeddings-path knowledge_base/embeddings\n\n# Full build including optional PDF articles (if config/articles.yml exists)\npython scripts/build_knowledge_base.py \\\n --config config/repositories.yml \\\n --articles-config config/articles.yml \\\n --base-path knowledge_base/raw \\\n --embeddings-path knowledge_base/embeddings \\\n --force-update \\\n --dirty\n# You can run without the dirty tag to automatically \n# remove source material after indexing is complete\n```\nRun `python scripts/build_knowledge_base.py -h` for all options.\n\n### 4.1 PDF Articles (Optional Quick Setup)\n1. Create `config/articles.yml` (example):\n```yaml\njournal_articles:\n - name: Paczynski_1986_ApJ_304_1\n url: https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF\n description: Paczynski (1986) \u2013 Gravitational microlensing\n```\n2. Install Java (for Tika PDF extraction) \u2013 macOS:\n```bash\nbrew install openjdk\nexport JAVA_HOME=\"/opt/homebrew/opt/openjdk\"\nexport PATH=\"$JAVA_HOME/bin:$PATH\"\n```\n3. (Optional fallback only) Install lightweight PDF libs if you skip Java:\n```bash\npip install PyPDF2 pdfplumber\n```\n4. Build with articles (explicit):\n```bash\npython scripts/build_knowledge_base.py --config config/repositories.yml --articles-config config/articles.yml\n```\n5. Keep raw PDFs for inspection: add `--dirty`.\n\nNotes:\n- If Java/Tika not available, script attempts fallback extraction (needs PyPDF2/pdfplumber or fitz).\n- Cleanups remove raw PDFs unless `--dirty` supplied.\n- Article docs are indexed under `journal_articles/<category>/<name>`.\n\nKey flags:\n- `--config` path to repositories YAML (was --repositories in older docs)\n- `--articles-config` optional PDF articles YAML\n- `--base-path` where raw repos/PDFs live (default knowledge_base/raw)\n- `--embeddings-path` output index directory\n- `--force-update` re-pull repos / re-download PDFs\n- `--category <name>` limit to one category\n- `--dry-run` show actions without performing\n- `--dirty` keep raw sources (skip cleanup)\n\nThis will:\n1. Clone / update listed repos under `knowledge_base/raw/<category>/<repo>`\n2. (Optionally) download PDFs into category directories\n3. Convert notebooks (*.ipynb -> *.nb.txt) if nb4llm available\n4. Extract and normalize text + (optionally) PDF text\n5. Build / update embeddings index at `knowledge_base/embeddings` (and `code_index` if dual embeddings enabled)\n\nRe-run when repositories or articles change.\n\n---\n## 5. Running Services\n\n### Web Admin Interface (Recommended for Getting Started)\n```bash\nnancy-brain ui\n# Opens Streamlit interface at http://localhost:8501\n# Features: search, repo management, build control, status\n```\n\n### HTTP API Server\n```bash\n# Using CLI\nnancy-brain serve\n\n# Or directly with uvicorn\nuvicorn connectors.http_api.app:app --host 0.0.0.0 --port 8000\n```\n\n### MCP Server (for AI Assistants)\n```bash\n# Run MCP stdio server\npython run_mcp_server.py\n```\n\nInitialize service programmatically (example pattern):\n```python\nfrom pathlib import Path\nfrom connectors.http_api.app import initialize_rag_service\ninitialize_rag_service(\n config_path=Path('config/repositories.yml'),\n embeddings_path=Path('knowledge_base/embeddings'),\n weights_path=Path('config/weights.yaml'),\n use_dual_embedding=True\n)\n```\nThe FastAPI dependency layer will then serve requests.\n\n### Command Line Search\n```bash\n# Quick search from command line\nnancy-brain search \"machine learning algorithms\" --limit 5\n\n# Search with custom paths\nnancy-brain search \"neural networks\" \\\n --embeddings-path custom/embeddings \\\n --config custom/repositories.yml\n```\n\n### 5.1 Endpoints (Bearer auth placeholder)\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | `/health` | Service status |\n| GET | `/version` | Index / build meta |\n| GET | `/search?query=...&limit=N` | Search documents |\n| POST | `/retrieve` | Retrieve passage (doc_id + line range) |\n| POST | `/retrieve/batch` | Batch retrieve |\n| GET | `/tree?prefix=...` | List KB tree |\n| POST | `/weight` | Set runtime doc weight |\n\nExample:\n```bash\ncurl -H \"Authorization: Bearer TEST\" 'http://localhost:8000/search?query=light%20curve&limit=5'\n```\n\nSet a document weight (boost factor 0.5\u20132.0 typical):\n```bash\ncurl -X POST -H 'Authorization: Bearer TEST' \\\n -H 'Content-Type: application/json' \\\n -d '{\"doc_id\":\"cat1/repoA/path/file.py\",\"multiplier\":2.0}' \\\n http://localhost:8000/weight\n```\n\n---\n## 6. MCP Server\nRun the MCP stdio server:\n```bash\npython run_mcp_server.py\n```\nTools exposed (operation names):\n- `search` (query, limit)\n- `retrieve` (doc_id, start, end)\n- `retrieve_batch`\n- `tree` (prefix, depth)\n- `set_weight` (doc_id, multiplier)\n- `status` / `version`\n\n### 6.1 VS Code Integration\n1. Install a Model Context Protocol client extension (e.g. \"MCP Explorer\" or equivalent).\n2. Add a server entry pointing to the script, stdio transport. Example config snippet:\n```\n{\n \"mcpServers\": {\n \"nancy-brain\": {\n \"command\": \"python\",\n \"args\": [\"/absolute/path/to/src/nancy-brain/run_mcp_server.py\"],\n \"env\": {\n \"PYTHONPATH\": \"/absolute/path/to/src/nancy-brain\" \n }\n }\n }\n}\n```\n\n*Specific mamba environment example:*\n\n```\n{\n\t\"servers\": {\n\t\t\"nancy-brain\": {\n\t\t\t\"type\": \"stdio\",\n\t\t\t\"command\": \"/Users/malpas.1/.local/share/mamba/envs/nancy-brain/bin/python\",\n\t\t\t\"args\": [\n\t\t\t\t\"/Users/malpas.1/Code/slack-bot/src/nancy-brain/run_mcp_server.py\"\n\t\t\t],\n\t\t\t\"env\": {\n\t\t\t\t\"PYTHONPATH\": \"/Users/malpas.1/Code/slack-bot/src/nancy-brain\",\n\t\t\t\t\"KMP_DUPLICATE_LIB_OK\": \"TRUE\"\n\t\t\t}\n\t\t}\n\t},\n\t\"inputs\": []\n}\n```\n\n3. Reload VS Code. The provider should list the tools; invoke `search` to test.\n\n### 6.2 Claude Desktop\nClaude supports MCP config in its settings file. Add an entry similar to above (command + args). Restart Claude Desktop; tools appear in the prompt tools menu.\n\n---\n## 7. Use Cases & Examples\n\n### For Researchers\n```bash\n# Add astronomy packages\nnancy-brain add-repo https://github.com/astropy/astropy.git\nnancy-brain add-repo https://github.com/rpoleski/MulensModel.git\n\n# Add key research papers\nnancy-brain add-article \\\n \"https://ui.adsabs.harvard.edu/link_gateway/1986ApJ...304....1P/PUB_PDF\" \\\n \"Paczynski_1986_microlensing\" \\\n --category \"foundational_papers\" \\\n --description \"Paczynski (1986) - Gravitational microlensing by the galactic halo\"\n\nnancy-brain build\n\n# AI can now answer: \"How do I model a microlensing event?\"\nnancy-brain search \"microlensing model fit\"\n```\n\n### For ML Engineers \n```bash\n# Add ML frameworks\nnancy-brain add-repo https://github.com/scikit-learn/scikit-learn.git\nnancy-brain add-repo https://github.com/pytorch/pytorch.git\nnancy-brain build\n\n# AI can now answer: \"Show me gradient descent implementation\"\nnancy-brain search \"gradient descent optimizer\"\n```\n\n### For Teams\n```bash\n# Launch web interface for non-technical users\nnancy-brain ui\n# Point team to http://localhost:8501\n# They can search, add repos, manage articles, trigger builds visually\n# Repository Management tab: Add GitHub repos\n# Articles tab: Add PDF papers and documents\n```\n\n---\n## 8. Slack Bot (Nancy)\nThe Slack-facing assistant lives outside this submodule (see parent repository). High-level steps:\n1. Ensure HTTP API running and reachable (or embed service directly in bot process).\n2. Bot receives user message -> constructs query -> calls `/search` and selected `/retrieve` for context.\n3. Bot composes answer including source references (doc_id and GitHub URL) before sending back.\n4. Optional: adaptively call `/weight` when feedback indicates a source should be boosted or dampened.\n\nCheck root-level `nancy_bot.py` or Slack integration docs (`SLACK.md`) for token setup and event subscription details.\n\n---\n## 9. Custom GPT (OpenAI Actions / Function Calls)\nDefine OpenAI tool specs mapping to HTTP endpoints:\n- `searchDocuments(query, limit)` -> GET /search\n- `retrievePassage(doc_id, start, end)` -> POST /retrieve\n- `listTree(prefix, depth)` -> GET /tree\n- `setWeight(doc_id, multiplier)` -> POST /weight\n\nUse an API gateway or direct URL. Include auth header. Provide JSON schemas matching request/response models.\n\n---\n## 10. Dynamic Weighting Flow\n1. Base score from embeddings (dual or single).\n2. Extension multiplier (from weights.yaml).\n3. Path multiplier(s) (cumulative).\n4. Model weight (static config + runtime overrides via `/weight`).\n5. Adjusted score = base * extension_weight * model_weight (and any path multipliers folded into extension weight step).\n\nRuntime `/weight` takes effect immediately on subsequent searches.\n\n---\n## 11. Updating / Rebuilding\n| Action | Command |\n|--------|---------|\n| Pull repo updates | `nancy-brain build --force-update` or re-run build script |\n| Change extension weights | Edit `config/weights.yaml` (no restart needed for runtime? restart or rebuild if cached) |\n| Change embedding model | Delete / rename existing `knowledge_base/embeddings` and rebuild with new env vars |\n\n---\n## 12. Deployment Notes\n- Containerize: build image with pre-built embeddings baked or mount a persistent volume.\n- Health probe: `/health` (returns 200 once rag_service initialized) else 503.\n- Concurrency: FastAPI async safe; weight updates are simple dict writes (low contention). For heavy load consider a lock if races appear.\n- Persistence of runtime weights: currently in-memory; persist manually if needed (extend `set_weight`).\n\n---\n## 13. Troubleshooting\n| Symptom | Cause | Fix |\n|---------|-------|-----|\n| 503 RAG service not initialized | `initialize_rag_service` not called / wrong paths | Call initializer with correct embeddings path |\n| Empty search results | Embeddings not built / wrong path | Re-run `nancy-brain build`, verify index directory |\n| macOS OpenMP crash | MKL / libomp duplicate | `KMP_DUPLICATE_LIB_OK=TRUE` already set early |\n| MCP tools not visible | Wrong path or PYTHONPATH | Use absolute paths in MCP config |\n| CLI command not found | Package not installed | `pip install nancy-brain` |\n\nEnable debug logging:\n```bash\nexport LOG_LEVEL=DEBUG\n```\n(add logic or run with `uvicorn --log-level debug`)\n\n---\n## 14. Development & Contributing\n```bash\n# Clone and set up development environment\ngit clone <repo-url>\ncd nancy-brain\npip install -e .\"[dev]\"\n\n# Run tests\npytest\n\n# Run linting\nblack nancy_brain/ \nflake8 nancy_brain/\n\n# Test CLI locally\nnancy-brain --help\n```\n\n### Releasing\nNancy Brain uses automated versioning and PyPI publishing:\n\n```bash\n# Bump patch version (0.1.0 \u2192 0.1.1)\n./release.sh patch\n\n# Bump minor version (0.1.0 \u2192 0.2.0) \n./release.sh minor\n\n# Bump major version (0.1.0 \u2192 1.0.0)\n./release.sh major\n```\n\nThis automatically:\n1. Updates version numbers in `pyproject.toml` and `nancy_brain/__init__.py`\n2. Creates a git commit and tag\n3. Pushes to GitHub, triggering PyPI publication via GitHub Actions\n\nManual version management:\n```bash\n# See current version and bump options\nbump-my-version show-bump\n\n# Dry run (see what would change)\nbump-my-version bump --dry-run patch\n```\n\n---\n## 15. Roadmap (Optional)\n- Persistence layer for runtime weights\n- Additional retrieval filters (e.g. semantic rerank)\n- Auth plugin / token validation\n- VS Code extension\n- Package publishing to PyPI\n\n---\n## 16. License\nSee parent repository license.\n\n---\n## 17. Minimal Verification Script\n```bash\n# After build & run\ncurl -H 'Authorization: Bearer TEST' 'http://localhost:8000/health'\n```\nExpect JSON with status + trace_id.\n\n---\nHappy searching.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Turn any GitHub repository into a searchable knowledge base for AI agents",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/AmberLee2427/nancy-brain/blob/main/README.md",
"Homepage": "https://github.com/AmberLee2427/nancy-brain",
"Issues": "https://github.com/AmberLee2427/nancy-brain/issues",
"Repository": "https://github.com/AmberLee2427/nancy-brain"
},
"split_keywords": [
"ai",
" rag",
" embeddings",
" github",
" knowledge-base",
" research",
" search"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8b35462f08b5c7be1d092136b0e03e1fb4d386f6ee87986656cfd6191481424f",
"md5": "e2d915977f8daacf581f535a5ffa2758",
"sha256": "d0220be525a4902106dc59013af4aa67cfde7346dde8b560f7498078a4b2c23a"
},
"downloads": -1,
"filename": "nancy_brain-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e2d915977f8daacf581f535a5ffa2758",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 52297,
"upload_time": "2025-08-24T20:37:05",
"upload_time_iso_8601": "2025-08-24T20:37:05.628714Z",
"url": "https://files.pythonhosted.org/packages/8b/35/462f08b5c7be1d092136b0e03e1fb4d386f6ee87986656cfd6191481424f/nancy_brain-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "14c0d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf",
"md5": "bb94010a94e65c9161dc980e193ee712",
"sha256": "4a612e49fba7f0ca4dc85837ca59a342521fbb170e6fafaa6da89cc885c69f8d"
},
"downloads": -1,
"filename": "nancy_brain-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "bb94010a94e65c9161dc980e193ee712",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1996881,
"upload_time": "2025-08-24T20:37:07",
"upload_time_iso_8601": "2025-08-24T20:37:07.607776Z",
"url": "https://files.pythonhosted.org/packages/14/c0/d93c1eccb16de523a93c6e7730d3f3373834f6d1d3d36c71ca8bd633cccf/nancy_brain-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-24 20:37:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AmberLee2427",
"github_project": "nancy-brain",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "nancy-brain"
}