semanticscout


Namesemanticscout JSON
Version 3.4.0 PyPI version JSON
download
home_pageNone
SummaryA language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis
upload_time2025-10-15 19:06:46
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords mcp context-engine code-search vector-database ai-agents semantic-search ollama chromadb
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SemanticScout 🔍
## Please note: this is just an idea project to try and build something for use in non Augment Code world 
I have yet to refactor lots of slop, and implement a bunch of key changes

> Language-aware semantic code search for AI agents withdependency analysis

[![Version](https://img.shields.io/badge/version-3.4.0-blue)]()
[![Tests](https://img.shields.io/badge/tests-passing-brightgreen)]()
[![Coverage](https://img.shields.io/badge/coverage-55%25-green)]()
[![Python](https://img.shields.io/badge/python-3.10+-blue)]()
[![License](https://img.shields.io/badge/license-MIT-blue)]()

**SemanticScout** is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.

## ✨ Key Features

- 🎯 **Language-Aware Analysis** - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)
- 🔍 **Semantic Code Search** - Natural language queries with 100% accuracy and intelligent context expansion
- 🧠 **Query Intent Tracking** - Meta-learning system that improves search performance over time
- 🚫 **Smart Test Filtering** - Automatically excludes test files (0% test pollution) with multi-strategy detection
- 🗂️ **Git Integration** - Smart filtering of untracked files and incremental indexing (5-10x faster updates)
- 🔄 **Hybrid Retrieval** - Combines semantic, symbol, and dependency-based search with AST parsing
- ⚡ **High Performance** - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing
- 🌐 **Multi-Language** - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
- 🤖 **MCP Ready** - Works with Claude Desktop and other MCP clients out of the box

## 🚀 Quick Start

Get started in **under 2 minutes** with zero configuration required!

### Prerequisites

- **uv** - [Install uv](https://docs.astral.sh/uv/getting-started/installation/)
- **Claude Desktop** - [Install Claude Desktop](https://claude.ai/download)

### Setup

1. **Configure Claude Desktop** - Add to your MCP configuration file:

**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
**Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}
```

2. **Restart Claude Desktop** - SemanticScout will be automatically downloaded and ready to use!

**✨ What you get:**
- Language-aware analysis with automatic project detection
- Fast local embeddings (sentence-transformers, no Ollama needed)
- Smart test file filtering and git integration
- All data stored in `~/semanticscout/`

> **Note:** Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.

## 📖 Usage

Once configured, use natural language to interact with SemanticScout through Claude:

### Example Conversations

**Index a codebase:**
```
You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]
```

**Search for code:**
```
You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]
```

**Advanced queries:**
```
You: "Show me dependency injection configuration"
Claude: [Automatically detects architectural query and expands coverage]
```

### Available Tools

| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `index_codebase` | Index a codebase with language-aware analysis | `path`, `incremental` |
| `search_code` | Search with natural language + smart filtering | `query`, `collection_name`, `exclude_test_files` |
| `find_symbol` | Find symbols with language-aware lookup | `symbol_name`, `collection_name` |
| `trace_dependencies` | Trace dependency chains | `file_path`, `collection_name`, `depth` |
| `list_collections` | List all indexed codebases | None |

### Advanced Features

- **Incremental Indexing**: Use `incremental=True` for 5-10x faster updates on existing codebases
- **Test Filtering**: Set `exclude_test_files=False` to include test files in search results
- **Coverage Modes**: Use `coverage_mode` for different result depths (focused/balanced/comprehensive/exhaustive)
- **Real-time Updates**: Process file change events from editors automatically

## 🔧 Configuration

### Default Setup (Recommended)
The default configuration works great for most users - no additional setup needed!

### Custom Embedding Models
To use a different sentence-transformers model:

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
      }
    }
  }
}
```

### Ollama (Optional - GPU Acceleration)
For GPU acceleration with Ollama:

```bash
# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
```

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text",
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
      }
    }
  }
}
```

## 🐛 Troubleshooting

### Common Issues

**Python Version Error:** Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)

**Ollama Not Available:** The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.

**Rate Limits:** Adjust limits with environment variables:
```json
"env": {
  "MAX_INDEXING_REQUESTS_PER_HOUR": "20",
  "MAX_SEARCH_REQUESTS_PER_MINUTE": "200"
}
```

## 📚 Documentation

- **[API Reference](docs/API_REFERENCE.md)** - Complete tool documentation
- **[User Guide](docs/USER_GUIDE.md)** - Examples and best practices
- **[Configuration](docs/CONFIGURATION.md)** - Advanced configuration options
- **[Performance Tuning](docs/PERFORMANCE_TUNING.md)** - Optimization guide

## 🏗️ Architecture

SemanticScout combines multiple technologies for intelligent code search:

- **Language Detection** → **AST Parsing** (tree-sitter) → **Symbol Extraction**
- **Semantic Chunking** → **Embeddings** (sentence-transformers/Ollama) → **Vector Storage** (ChromaDB)
- **Dependency Analysis** → **Graph Storage** (NetworkX) → **Symbol Tables** (SQLite)
- **Hybrid Search** → **Context Expansion** → **Smart Filtering**

## 🤝 Contributing

Contributions welcome! See our [contributing guide](CONTRIBUTING.md) for details.

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

---

**Built with ❤️ for the AI agent ecosystem**


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "semanticscout",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Psynosaur <psynosaur@gmail.com>",
    "keywords": "mcp, context-engine, code-search, vector-database, ai-agents, semantic-search, ollama, chromadb",
    "author": null,
    "author_email": "Psynosaur <psynosaur@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6a/0e/d135065165e328744a34e8e3d65fa66e0c056827474d025771b953619934/semanticscout-3.4.0.tar.gz",
    "platform": null,
    "description": "# SemanticScout \ud83d\udd0d\r\n## Please note: this is just an idea project to try and build something for use in non Augment Code world \r\nI have yet to refactor lots of slop, and implement a bunch of key changes\r\n\r\n> Language-aware semantic code search for AI agents withdependency analysis\r\n\r\n[![Version](https://img.shields.io/badge/version-3.4.0-blue)]()\r\n[![Tests](https://img.shields.io/badge/tests-passing-brightgreen)]()\r\n[![Coverage](https://img.shields.io/badge/coverage-55%25-green)]()\r\n[![Python](https://img.shields.io/badge/python-3.10+-blue)]()\r\n[![License](https://img.shields.io/badge/license-MIT-blue)]()\r\n\r\n**SemanticScout** is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.\r\n\r\n## \u2728 Key Features\r\n\r\n- \ud83c\udfaf **Language-Aware Analysis** - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)\r\n- \ud83d\udd0d **Semantic Code Search** - Natural language queries with 100% accuracy and intelligent context expansion\r\n- \ud83e\udde0 **Query Intent Tracking** - Meta-learning system that improves search performance over time\r\n- \ud83d\udeab **Smart Test Filtering** - Automatically excludes test files (0% test pollution) with multi-strategy detection\r\n- \ud83d\uddc2\ufe0f **Git Integration** - Smart filtering of untracked files and incremental indexing (5-10x faster updates)\r\n- \ud83d\udd04 **Hybrid Retrieval** - Combines semantic, symbol, and dependency-based search with AST parsing\r\n- \u26a1 **High Performance** - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing\r\n- \ud83c\udf10 **Multi-Language** - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++\r\n- \ud83e\udd16 **MCP Ready** - Works with Claude Desktop and other MCP clients out of the box\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\nGet started in **under 2 minutes** with zero configuration required!\r\n\r\n### Prerequisites\r\n\r\n- **uv** - [Install uv](https://docs.astral.sh/uv/getting-started/installation/)\r\n- **Claude Desktop** - [Install Claude Desktop](https://claude.ai/download)\r\n\r\n### Setup\r\n\r\n1. **Configure Claude Desktop** - Add to your MCP configuration file:\r\n\r\n**Windows:** `%APPDATA%\\Claude\\claude_desktop_config.json`\r\n**Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`\r\n\r\n```json\r\n{\r\n  \"mcpServers\": {\r\n    \"semanticscout\": {\r\n      \"command\": \"uvx\",\r\n      \"args\": [\"--python\", \"3.12\", \"semanticscout@latest\"]\r\n    }\r\n  }\r\n}\r\n```\r\n\r\n2. **Restart Claude Desktop** - SemanticScout will be automatically downloaded and ready to use!\r\n\r\n**\u2728 What you get:**\r\n- Language-aware analysis with automatic project detection\r\n- Fast local embeddings (sentence-transformers, no Ollama needed)\r\n- Smart test file filtering and git integration\r\n- All data stored in `~/semanticscout/`\r\n\r\n> **Note:** Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.\r\n\r\n## \ud83d\udcd6 Usage\r\n\r\nOnce configured, use natural language to interact with SemanticScout through Claude:\r\n\r\n### Example Conversations\r\n\r\n**Index a codebase:**\r\n```\r\nYou: \"Index my codebase at /workspace\"\r\nClaude: [Calls index_codebase tool and shows indexing progress]\r\n```\r\n\r\n**Search for code:**\r\n```\r\nYou: \"Find the authentication logic\"\r\nClaude: [Calls search_code tool and shows relevant code snippets]\r\n```\r\n\r\n**Advanced queries:**\r\n```\r\nYou: \"Show me dependency injection configuration\"\r\nClaude: [Automatically detects architectural query and expands coverage]\r\n```\r\n\r\n### Available Tools\r\n\r\n| Tool | Description | Key Parameters |\r\n|------|-------------|----------------|\r\n| `index_codebase` | Index a codebase with language-aware analysis | `path`, `incremental` |\r\n| `search_code` | Search with natural language + smart filtering | `query`, `collection_name`, `exclude_test_files` |\r\n| `find_symbol` | Find symbols with language-aware lookup | `symbol_name`, `collection_name` |\r\n| `trace_dependencies` | Trace dependency chains | `file_path`, `collection_name`, `depth` |\r\n| `list_collections` | List all indexed codebases | None |\r\n\r\n### Advanced Features\r\n\r\n- **Incremental Indexing**: Use `incremental=True` for 5-10x faster updates on existing codebases\r\n- **Test Filtering**: Set `exclude_test_files=False` to include test files in search results\r\n- **Coverage Modes**: Use `coverage_mode` for different result depths (focused/balanced/comprehensive/exhaustive)\r\n- **Real-time Updates**: Process file change events from editors automatically\r\n\r\n## \ud83d\udd27 Configuration\r\n\r\n### Default Setup (Recommended)\r\nThe default configuration works great for most users - no additional setup needed!\r\n\r\n### Custom Embedding Models\r\nTo use a different sentence-transformers model:\r\n\r\n```json\r\n{\r\n  \"mcpServers\": {\r\n    \"semanticscout\": {\r\n      \"command\": \"uvx\",\r\n      \"args\": [\"--python\", \"3.12\", \"semanticscout@latest\"],\r\n      \"env\": {\r\n        \"SEMANTICSCOUT_CONFIG_JSON\": \"{\\\"embedding\\\":{\\\"provider\\\":\\\"sentence-transformers\\\",\\\"model\\\":\\\"all-mpnet-base-v2\\\"}}\"\r\n      }\r\n    }\r\n  }\r\n}\r\n```\r\n\r\n### Ollama (Optional - GPU Acceleration)\r\nFor GPU acceleration with Ollama:\r\n\r\n```bash\r\n# Start Ollama and pull model\r\nollama serve\r\nollama pull nomic-embed-text\r\n```\r\n\r\n```json\r\n{\r\n  \"mcpServers\": {\r\n    \"semanticscout\": {\r\n      \"command\": \"uvx\",\r\n      \"args\": [\"--python\", \"3.12\", \"semanticscout@latest\"],\r\n      \"env\": {\r\n        \"OLLAMA_BASE_URL\": \"http://localhost:11434\",\r\n        \"OLLAMA_MODEL\": \"nomic-embed-text\",\r\n        \"SEMANTICSCOUT_CONFIG_JSON\": \"{\\\"embedding\\\":{\\\"provider\\\":\\\"ollama\\\"}}\"\r\n      }\r\n    }\r\n  }\r\n}\r\n```\r\n\r\n## \ud83d\udc1b Troubleshooting\r\n\r\n### Common Issues\r\n\r\n**Python Version Error:** Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)\r\n\r\n**Ollama Not Available:** The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.\r\n\r\n**Rate Limits:** Adjust limits with environment variables:\r\n```json\r\n\"env\": {\r\n  \"MAX_INDEXING_REQUESTS_PER_HOUR\": \"20\",\r\n  \"MAX_SEARCH_REQUESTS_PER_MINUTE\": \"200\"\r\n}\r\n```\r\n\r\n## \ud83d\udcda Documentation\r\n\r\n- **[API Reference](docs/API_REFERENCE.md)** - Complete tool documentation\r\n- **[User Guide](docs/USER_GUIDE.md)** - Examples and best practices\r\n- **[Configuration](docs/CONFIGURATION.md)** - Advanced configuration options\r\n- **[Performance Tuning](docs/PERFORMANCE_TUNING.md)** - Optimization guide\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\nSemanticScout combines multiple technologies for intelligent code search:\r\n\r\n- **Language Detection** \u2192 **AST Parsing** (tree-sitter) \u2192 **Symbol Extraction**\r\n- **Semantic Chunking** \u2192 **Embeddings** (sentence-transformers/Ollama) \u2192 **Vector Storage** (ChromaDB)\r\n- **Dependency Analysis** \u2192 **Graph Storage** (NetworkX) \u2192 **Symbol Tables** (SQLite)\r\n- **Hybrid Search** \u2192 **Context Expansion** \u2192 **Smart Filtering**\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions welcome! See our [contributing guide](CONTRIBUTING.md) for details.\r\n\r\n1. Fork the repository\r\n2. Create a feature branch\r\n3. Add tests for new functionality\r\n4. Ensure all tests pass\r\n5. Submit a pull request\r\n\r\n## \ud83d\udcc4 License\r\n\r\nMIT License - see [LICENSE](LICENSE) for details.\r\n\r\n---\r\n\r\n**Built with \u2764\ufe0f for the AI agent ecosystem**\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis",
    "version": "3.4.0",
    "project_urls": {
        "Documentation": "https://github.com/Psynosaur/SemanticScout#readme",
        "Homepage": "https://github.com/Psynosaur/SemanticScout",
        "Issues": "https://github.com/Psynosaur/SemanticScout/issues",
        "Repository": "https://github.com/Psynosaur/SemanticScout.git"
    },
    "split_keywords": [
        "mcp",
        " context-engine",
        " code-search",
        " vector-database",
        " ai-agents",
        " semantic-search",
        " ollama",
        " chromadb"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "950c8315e72fdfb0d2cbc771c474be3af588df4d87aee9e6389e4ebfe875d755",
                "md5": "e2f5564887cc9b4808ce90e5291ef846",
                "sha256": "7cc8aeab9d7170606ca31a643b248d893bcac8044f26dd71b9a5c2b2233e282e"
            },
            "downloads": -1,
            "filename": "semanticscout-3.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e2f5564887cc9b4808ce90e5291ef846",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 183861,
            "upload_time": "2025-10-15T19:06:44",
            "upload_time_iso_8601": "2025-10-15T19:06:44.830794Z",
            "url": "https://files.pythonhosted.org/packages/95/0c/8315e72fdfb0d2cbc771c474be3af588df4d87aee9e6389e4ebfe875d755/semanticscout-3.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6a0ed135065165e328744a34e8e3d65fa66e0c056827474d025771b953619934",
                "md5": "d655bc0d05309df34b100d236212feac",
                "sha256": "56f3d73a0e99669ec58863bf1732624c64cbcc892168a76105af33dbfc460f28"
            },
            "downloads": -1,
            "filename": "semanticscout-3.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d655bc0d05309df34b100d236212feac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 192878,
            "upload_time": "2025-10-15T19:06:46",
            "upload_time_iso_8601": "2025-10-15T19:06:46.261141Z",
            "url": "https://files.pythonhosted.org/packages/6a/0e/d135065165e328744a34e8e3d65fa66e0c056827474d025771b953619934/semanticscout-3.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-15 19:06:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Psynosaur",
    "github_project": "SemanticScout#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "semanticscout"
}
        
Elapsed time: 2.16347s