# Inkognito
Privacy-first document processing FastMCP server. Extract, anonymize, and segment documents through FastMCP's modern tool interface.
Please note: As an MCP, privacy of file contents cannot be absolutely guaranteed, but it is a central design consideration. While file _contents_ should be low risk (but non-zero) risk for leakage, file _names_ will, unavoidably and by design, be read and written by the MCP. Plan accordingly. Consider using a local model.
## Quick Start
### Installation
```bash
# Install via pip
pip install inkognito
# Or via uvx (no Python setup needed)
uvx inkognito
# Or run directly with FastMCP
fastmcp run inkognito
```
### Configure Claude Desktop
If not already present, you need to make sure you add a filesystem MCP.
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"inkognito": {
"command": "uvx",
"args": ["inkognito"],
"env": {
// Optional: Add keys when extractors are implemented
// "AZURE_DI_KEY": "your-key-here",
// "LLAMAPARSE_API_KEY": "your-key-here"
}
},
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you/input-files-or-whatever",
"/Users/you/output-folder-if-you-want-one"
],
"env": {},
"transport": "stdio",
"type": null,
"cwd": null,
"timeout": null,
"description": null,
"icon": null,
"authentication": null
}
}
}
```
### Basic Usage
In Claude Desktop:
```
"Extract this PDF to markdown"
"Anonymize all documents in my contracts folder"
"Split this large document into chunks for processing"
"Create individual prompts from this documentation"
```
## Features
### 🔒 Privacy-First Anonymization
- Universal PII detection (50+ types)
- Consistent replacements across all documents
- Reversible with secure vault file
- No configuration needed - smart defaults
### 📄 Multiple Extraction Options
- **Available Now**: Docling (default, with OCR support)
- **Planned**: Azure DI, LlamaIndex, MinerU (placeholders only)
- Auto-selects best available option
- Falls back to Docling if no cloud options
### ✂️ Intelligent Segmentation
- **Large documents**: 10k-30k token chunks
- **Prompt generation**: Split by headings
- Preserves context and structure
- Markdown-native processing
## FastMCP Tools
All tools are exposed through FastMCP's modern interface with automatic progress reporting and error handling.
### anonymize_documents
Replace PII with consistent fake data across multiple files.
```python
anonymize_documents(
directory="/path/to/docs",
output_dir="/secure/output"
)
```
### extract_document
Convert PDF/DOCX to markdown.
```python
extract_document(
file_path="/path/to/document.pdf",
extraction_method="auto" # auto, docling (others coming soon)
)
```
### segment_document
Split large documents for LLM processing.
```python
segment_document(
file_path="/path/to/large.md",
output_dir="/output/segments",
max_tokens=20000
)
```
### split_into_prompts
Create individual prompts from structured content.
```python
split_into_prompts(
file_path="/path/to/guide.md",
output_dir="/output/prompts",
split_level="h2", #configurable, LLM should be able to read the contents of these files safely
)
```
### restore_documents
Restore original PII using vault.
```python
restore_documents(
directory="/anonymized/docs",
output_dir="/restored",
vault_path="/secure/vault.json"
)
```
## Extractor Status
| Extractor | Status | Notes |
| -------------- | -------------------- | -------------------------------------------------------------------------------- |
| **Docling** | ✅ Fully Implemented | Default extractor with OCR support (OCRMac on macOS, EasyOCR on other platforms) |
| **Azure DI** | ⚠️ Placeholder | Requires `AZURE_DI_KEY` environment variable when implemented |
| **LlamaIndex** | ⚠️ Placeholder | Requires `LLAMAPARSE_API_KEY` environment variable when implemented |
| **MinerU** | ⚠️ Placeholder | Will require magic-pdf library when implemented |
## Configuration
Following FastMCP conventions, all configuration is via environment variables:
```bash
# Optional API keys for cloud extractors (when implemented)
export AZURE_DI_KEY="your-key-here"
export LLAMAPARSE_API_KEY="your-key-here"
# Optional OCR languages (comma-separated, default: all available)
export INKOGNITO_OCR_LANGUAGES="en,fr,de"
```
## Examples
### Legal Document Processing
```
You: "Anonymize all contracts in the merger folder for review"
Claude: "I'll anonymize those contracts for you...
[Processing 23 files...]
✓ Anonymized 23 contracts
✓ Replaced: 145 company names, 89 person names, 67 case numbers
✓ Vault saved to: /output/vault.json
```
### Research Paper Extraction
```
You: "Extract this 300-page research PDF"
Claude: "I'll extract that PDF to markdown...
[Using Docling for extraction...]
✓ Extracted 300 pages
✓ Preserved: tables, figures, citations
✓ Output size: 487,000 tokens
✓ Saved to: research_paper.md
```
### Documentation to Prompts
```
You: "Split this API documentation into individual prompts"
Claude: "I'll split the documentation by endpoints...
[Splitting by H2 headings...]
✓ Created 47 prompt files
✓ Each prompt includes endpoint context
✓ Ready for training or testing
```
## Performance
| Extractor | Speed | Requirements | Status |
| ---------- | -------------- | ------------ | ------------ |
| Azure DI | 0.2-1 sec/page | API key | Planned |
| LlamaIndex | 1-2 sec/page | API key | Planned |
| MinerU | 3-7 sec/page | Local, GPU | Planned |
| Docling | 5-10 sec/page | Local, CPU | ✅ Available |
## Privacy & Security
- **Local processing**: No cloud services required
- **No persistence**: Nothing saved without explicit paths
- **Secure vaults**: Encrypted mapping storage
- **API key safety**: Never logged or transmitted
## Development
### Running Locally
```bash
# Clone the repository
git clone https://github.com/phren0logy/inkognito
cd inkognito
# Run with FastMCP CLI
fastmcp dev
# Or run directly in development
uv run python server.py
```
### Testing with FastMCP
```bash
# Install the server configuration
fastmcp install inkognito
# Test a specific tool
fastmcp test inkognito extract_document
```
## Project Structure
```
inkognito/
├── pyproject.toml # FastMCP-compatible packaging
├── LICENSE # MIT license
├── README.md # This file
├── server.py # FastMCP server and entry point
├── anonymizer.py # PII detection and anonymization
├── vault.py # Vault management for reversibility
├── segmenter.py # Document segmentation
├── exceptions.py # Custom exceptions
├── extractors/ # PDF extraction backends
│ ├── __init__.py
│ ├── base.py
│ ├── registry.py
│ ├── docling.py # ✅ Implemented
│ ├── azure_di.py # Placeholder
│ ├── llamaindex.py # Placeholder
│ └── mineru.py # Placeholder
└── tests/
```
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "inkognito",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "anonymization, document-processing, extraction, fastmcp, mcp, pdf, pii, privacy",
"author": null,
"author_email": "Andrew Nanton <git-nanton@stanford.edu>",
"download_url": "https://files.pythonhosted.org/packages/1b/e3/3ecc1b1c8650c06c277b4123326a804beb412d0b77617e8e88c014d90b74/inkognito-0.1.0.tar.gz",
"platform": null,
"description": "# Inkognito\n\nPrivacy-first document processing FastMCP server. Extract, anonymize, and segment documents through FastMCP's modern tool interface.\n\nPlease note: As an MCP, privacy of file contents cannot be absolutely guaranteed, but it is a central design consideration. While file _contents_ should be low risk (but non-zero) risk for leakage, file _names_ will, unavoidably and by design, be read and written by the MCP. Plan accordingly. Consider using a local model.\n\n## Quick Start\n\n### Installation\n\n```bash\n# Install via pip\npip install inkognito\n\n# Or via uvx (no Python setup needed)\nuvx inkognito\n\n# Or run directly with FastMCP\nfastmcp run inkognito\n```\n\n### Configure Claude Desktop\n\nIf not already present, you need to make sure you add a filesystem MCP.\n\nAdd to your `claude_desktop_config.json`:\n\n```json\n{\n \"mcpServers\": {\n \"inkognito\": {\n \"command\": \"uvx\",\n \"args\": [\"inkognito\"],\n \"env\": {\n // Optional: Add keys when extractors are implemented\n // \"AZURE_DI_KEY\": \"your-key-here\",\n // \"LLAMAPARSE_API_KEY\": \"your-key-here\"\n }\n },\n \"filesystem\": {\n \"command\": \"npx\",\n \"args\": [\n \"-y\",\n \"@modelcontextprotocol/server-filesystem\",\n \"/Users/you/input-files-or-whatever\",\n \"/Users/you/output-folder-if-you-want-one\"\n ],\n \"env\": {},\n \"transport\": \"stdio\",\n \"type\": null,\n \"cwd\": null,\n \"timeout\": null,\n \"description\": null,\n \"icon\": null,\n \"authentication\": null\n }\n }\n}\n```\n\n### Basic Usage\n\nIn Claude Desktop:\n\n```\n\"Extract this PDF to markdown\"\n\"Anonymize all documents in my contracts folder\"\n\"Split this large document into chunks for processing\"\n\"Create individual prompts from this documentation\"\n```\n\n## Features\n\n### \ud83d\udd12 Privacy-First Anonymization\n\n- Universal PII detection (50+ types)\n- Consistent replacements across all documents\n- Reversible with secure vault file\n- No configuration needed - smart defaults\n\n### \ud83d\udcc4 Multiple Extraction Options\n\n- **Available Now**: Docling (default, with OCR support)\n- **Planned**: Azure DI, LlamaIndex, MinerU (placeholders only)\n- Auto-selects best available option\n- Falls back to Docling if no cloud options\n\n### \u2702\ufe0f Intelligent Segmentation\n\n- **Large documents**: 10k-30k token chunks\n- **Prompt generation**: Split by headings\n- Preserves context and structure\n- Markdown-native processing\n\n## FastMCP Tools\n\nAll tools are exposed through FastMCP's modern interface with automatic progress reporting and error handling.\n\n### anonymize_documents\n\nReplace PII with consistent fake data across multiple files.\n\n```python\nanonymize_documents(\n directory=\"/path/to/docs\",\n output_dir=\"/secure/output\"\n)\n```\n\n### extract_document\n\nConvert PDF/DOCX to markdown.\n\n```python\nextract_document(\n file_path=\"/path/to/document.pdf\",\n extraction_method=\"auto\" # auto, docling (others coming soon)\n)\n```\n\n### segment_document\n\nSplit large documents for LLM processing.\n\n```python\nsegment_document(\n file_path=\"/path/to/large.md\",\n output_dir=\"/output/segments\",\n max_tokens=20000\n)\n```\n\n### split_into_prompts\n\nCreate individual prompts from structured content.\n\n```python\nsplit_into_prompts(\n file_path=\"/path/to/guide.md\",\n output_dir=\"/output/prompts\",\n split_level=\"h2\", #configurable, LLM should be able to read the contents of these files safely\n)\n```\n\n### restore_documents\n\nRestore original PII using vault.\n\n```python\nrestore_documents(\n directory=\"/anonymized/docs\",\n output_dir=\"/restored\",\n vault_path=\"/secure/vault.json\"\n)\n```\n\n## Extractor Status\n\n| Extractor | Status | Notes |\n| -------------- | -------------------- | -------------------------------------------------------------------------------- |\n| **Docling** | \u2705 Fully Implemented | Default extractor with OCR support (OCRMac on macOS, EasyOCR on other platforms) |\n| **Azure DI** | \u26a0\ufe0f Placeholder | Requires `AZURE_DI_KEY` environment variable when implemented |\n| **LlamaIndex** | \u26a0\ufe0f Placeholder | Requires `LLAMAPARSE_API_KEY` environment variable when implemented |\n| **MinerU** | \u26a0\ufe0f Placeholder | Will require magic-pdf library when implemented |\n\n## Configuration\n\nFollowing FastMCP conventions, all configuration is via environment variables:\n\n```bash\n# Optional API keys for cloud extractors (when implemented)\nexport AZURE_DI_KEY=\"your-key-here\"\nexport LLAMAPARSE_API_KEY=\"your-key-here\"\n\n# Optional OCR languages (comma-separated, default: all available)\nexport INKOGNITO_OCR_LANGUAGES=\"en,fr,de\"\n```\n\n## Examples\n\n### Legal Document Processing\n\n```\nYou: \"Anonymize all contracts in the merger folder for review\"\n\nClaude: \"I'll anonymize those contracts for you...\n\n[Processing 23 files...]\n\n\u2713 Anonymized 23 contracts\n\u2713 Replaced: 145 company names, 89 person names, 67 case numbers\n\u2713 Vault saved to: /output/vault.json\n```\n\n### Research Paper Extraction\n\n```\nYou: \"Extract this 300-page research PDF\"\n\nClaude: \"I'll extract that PDF to markdown...\n\n[Using Docling for extraction...]\n\n\u2713 Extracted 300 pages\n\u2713 Preserved: tables, figures, citations\n\u2713 Output size: 487,000 tokens\n\u2713 Saved to: research_paper.md\n```\n\n### Documentation to Prompts\n\n```\nYou: \"Split this API documentation into individual prompts\"\n\nClaude: \"I'll split the documentation by endpoints...\n\n[Splitting by H2 headings...]\n\n\u2713 Created 47 prompt files\n\u2713 Each prompt includes endpoint context\n\u2713 Ready for training or testing\n```\n\n## Performance\n\n| Extractor | Speed | Requirements | Status |\n| ---------- | -------------- | ------------ | ------------ |\n| Azure DI | 0.2-1 sec/page | API key | Planned |\n| LlamaIndex | 1-2 sec/page | API key | Planned |\n| MinerU | 3-7 sec/page | Local, GPU | Planned |\n| Docling | 5-10 sec/page | Local, CPU | \u2705 Available |\n\n## Privacy & Security\n\n- **Local processing**: No cloud services required\n- **No persistence**: Nothing saved without explicit paths\n- **Secure vaults**: Encrypted mapping storage\n- **API key safety**: Never logged or transmitted\n\n## Development\n\n### Running Locally\n\n```bash\n# Clone the repository\ngit clone https://github.com/phren0logy/inkognito\ncd inkognito\n\n# Run with FastMCP CLI\nfastmcp dev\n\n# Or run directly in development\nuv run python server.py\n```\n\n### Testing with FastMCP\n\n```bash\n# Install the server configuration\nfastmcp install inkognito\n\n# Test a specific tool\nfastmcp test inkognito extract_document\n```\n\n## Project Structure\n\n```\ninkognito/\n\u251c\u2500\u2500 pyproject.toml # FastMCP-compatible packaging\n\u251c\u2500\u2500 LICENSE # MIT license\n\u251c\u2500\u2500 README.md # This file\n\u251c\u2500\u2500 server.py # FastMCP server and entry point\n\u251c\u2500\u2500 anonymizer.py # PII detection and anonymization\n\u251c\u2500\u2500 vault.py # Vault management for reversibility\n\u251c\u2500\u2500 segmenter.py # Document segmentation\n\u251c\u2500\u2500 exceptions.py # Custom exceptions\n\u251c\u2500\u2500 extractors/ # PDF extraction backends\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 base.py\n\u2502 \u251c\u2500\u2500 registry.py\n\u2502 \u251c\u2500\u2500 docling.py # \u2705 Implemented\n\u2502 \u251c\u2500\u2500 azure_di.py # Placeholder\n\u2502 \u251c\u2500\u2500 llamaindex.py # Placeholder\n\u2502 \u2514\u2500\u2500 mineru.py # Placeholder\n\u2514\u2500\u2500 tests/\n```\n\n## License\n\nMIT License - see LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Privacy-first document processing FastMCP server with PII anonymization",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/phren0logy/inkognito#readme",
"Homepage": "https://github.com/phren0logy/inkognito",
"Issues": "https://github.com/phren0logy/inkognito/issues",
"Repository": "https://github.com/phren0logy/inkognito"
},
"split_keywords": [
"anonymization",
" document-processing",
" extraction",
" fastmcp",
" mcp",
" pdf",
" pii",
" privacy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "78c31490de84b78e78dbd004f336f788e5ad299ae71eea03a1136e251a27780d",
"md5": "bd2fe9a9d21281206c9e9f46b875ff48",
"sha256": "de866f176003e0e83a24759e719599c96f363b911685e2191fa7d0cb75b3fda4"
},
"downloads": -1,
"filename": "inkognito-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bd2fe9a9d21281206c9e9f46b875ff48",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 29973,
"upload_time": "2025-08-13T17:45:51",
"upload_time_iso_8601": "2025-08-13T17:45:51.047983Z",
"url": "https://files.pythonhosted.org/packages/78/c3/1490de84b78e78dbd004f336f788e5ad299ae71eea03a1136e251a27780d/inkognito-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1be33ecc1b1c8650c06c277b4123326a804beb412d0b77617e8e88c014d90b74",
"md5": "bce1fe51de78838bae2db8f26d1376b4",
"sha256": "d9c2983fce901decb919165ace412603c7f1ec8376581c176fb8283316e92823"
},
"downloads": -1,
"filename": "inkognito-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "bce1fe51de78838bae2db8f26d1376b4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 88313,
"upload_time": "2025-08-13T17:45:52",
"upload_time_iso_8601": "2025-08-13T17:45:52.621931Z",
"url": "https://files.pythonhosted.org/packages/1b/e3/3ecc1b1c8650c06c277b4123326a804beb412d0b77617e8e88c014d90b74/inkognito-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-13 17:45:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "phren0logy",
"github_project": "inkognito#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "inkognito"
}