<div align="center" dir="auto">
<img width="250" src="https://duckdb.org/images/logo-dl/DuckDB_Logo-stacked.svg" style="max-width: 100%" alt="DuckDB">
<h1>🧠 Cognee DuckDB Vector Adapter</h1>
</div>
<div align="center" style="margin-top: 20px;">
<span style="display: block; margin-bottom: 10px;">Lightning fast embedded vector search for Cognee using DuckDB with planned graph support</span>
<br />
[](https://opensource.org/licenses/Apache-2.0)

[](https://duckdb.org)
</div>
<div align="center">
<div display="inline-block">
<a href="https://github.com/topoteretes/cognee"><b>Cognee</b></a>
<a href="https://duckdb.org/docs/"><b>DuckDB Docs</b></a>
<a href="#examples"><b>Examples</b></a>
<a href="#troubleshooting"><b>Support</b></a>
</div>
<br />
</div>
## Features
- **Zero-configuration** embedded vector database - no external server required
- Full support for vector embeddings storage and retrieval
- High-performance vector similarity search using DuckDB's native array operations
- Persistent or in-memory database options
- **Vector-first design** with planned graph support in future releases
- Comprehensive error handling and logging
## Installation
```bash
pip install cognee-community-hybrid-adapter-duckdb
```
## Prerequisites
**None!** DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.
## Examples
Checkout the `examples/` folder!
**Basic vector search example:**
```bash
uv run examples/example.py
```
**Document processing example with generated story:**
```bash
uv run examples/simple_document_example/cognee_simple_document_demo.py
```
This example demonstrates processing a generated story text file (`generated_story.txt`) along with other documents like Alice in Wonderland.
>You will need an OpenAI API key to run the example scripts.
## Usage
```python
import os
import asyncio
from cognee import config, prune, add, cognify, search, SearchType
# Import the register module to enable DuckDB support
from cognee_community_hybrid_adapter_duckdb import register
async def main():
# Configure DuckDB as vector database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "my_database.db", # File path or None for in-memory
})
# Optional: Clean previous data
await prune.prune_data()
await prune.prune_system()
# Add your content
await add("""
Natural language processing (NLP) is an interdisciplinary
subfield of computer science and information retrieval.
""")
# Process with cognee
await cognify()
# Search (use vector-based search types)
search_results = await search(
query_type=SearchType.CHUNKS,
query_text="Tell me about NLP"
)
for result in search_results:
print("Search result:", result)
if __name__ == "__main__":
asyncio.run(main())
```
## Configuration
Configure DuckDB as your vector database in cognee:
- `vector_db_provider`: Set to "duckdb"
- `vector_db_url`: Database file path (e.g., "my_db.db"), `None` for in-memory, or MotherDuck URL for cloud
### Database Options
```python
# Persistent file-based database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "cognee_vectors.db"
})
# In-memory database (fastest, but data is lost on restart)
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": None # or ":memory:"
})
# Absolute path to database file
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "/path/to/my/database.db"
})
# MotherDuck cloud database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "md:my_database" # Replace with your MotherDuck database
})
```
## Requirements
- Python >= 3.12, <= 3.13
- duckdb >= 1.3.2
- cognee >= 0.2.3
## Roadmap: Graph Support
This adapter is currently **vector-focused** with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.
**Current Status:**
- ✅ Full vector similarity search
- ✅ Embedding storage and retrieval
- ✅ Collection management
- 🚧 Graph operations (coming soon)
## Error Handling
The adapter includes comprehensive error handling:
- `CollectionNotFoundError`: Raised when attempting operations on non-existent collections
- `InvalidValueError`: Raised for invalid query parameters
- `NotImplementedError`: Currently raised for graph operations (graph support coming soon)
- Graceful handling of database connection issues and embedding errors
## Performance
DuckDB provides excellent performance characteristics:
- **Embedded**: No network overhead - everything runs in-process
- **Columnar**: Optimized storage format for analytical workloads
- **Vectorized**: SIMD operations for fast vector similarity calculations
- **ACID**: Full transactional support with data consistency
- **Memory efficient**: Minimal memory footprint compared to traditional databases
## Troubleshooting
### Common Issues
1. **File Permission Errors**: Ensure write permissions to the directory containing your database file
2. **Embedding Dimension Mismatch**: Verify embedding dimensions match collection configuration
3. **Collection Not Found**: Always create collections before adding data points
4. **Graph Operations**: Graph support is planned for future releases - currently use vector search
### Debug Logging
The adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:
```python
import logging
logging.getLogger("DuckDBAdapter").setLevel(logging.DEBUG)
```
### Database Option Comparison
| Option | Pros | Cons |
|--------|------|------|
| File-based (`"my_db.db"`) | ✅ Persistent storage<br/>✅ Survives restarts<br/>✅ Can handle large datasets | ❌ Slower I/O<br/>❌ Disk space usage |
| In-memory (`None`) | ✅ Maximum performance<br/>✅ No disk usage<br/>✅ Perfect for testing | ❌ Data lost on restart<br/>❌ Limited by RAM |
| MotherDuck (`"md:database"`) | ✅ Cloud-hosted<br/>✅ Shared access<br/>✅ Managed service<br/>✅ Scalable | ❌ Requires internet<br/>❌ Potential latency<br/>❌ MotherDuck account needed |
## Development
To contribute or modify the adapter:
1. Clone the repository and `cd` into the `packages/hybrid/duckdb` folder
2. Install dependencies: `uv sync --all-extras`
3. Run tests: `uv run examples/example.py`
4. Make your changes, test, and submit a PR
## Extensions Used
This adapter automatically loads these DuckDB extensions:
- **duckpgq**: Property graph queries (foundation for upcoming graph support)
- **vss**: Vector similarity search with HNSW indexing support
Raw data
{
"_id": null,
"home_page": "https://github.com/topoteretes/cognee-community",
"name": "cognee-community-hybrid-adapter-duckdb",
"maintainer": "Cognee Community",
"docs_url": null,
"requires_python": "<=3.13,>=3.10",
"maintainer_email": "community@cognee.ai",
"keywords": "cognee, duckdb, vector, database, embeddings, ai, ml",
"author": "Cognee Community",
"author_email": "community@cognee.ai",
"download_url": "https://files.pythonhosted.org/packages/93/b8/14132c644806baa1f5397a978afc1b45bc9f62b0a3077a2f50e895762ac1/cognee_community_hybrid_adapter_duckdb-0.1.1.tar.gz",
"platform": null,
"description": "<div align=\"center\" dir=\"auto\">\n <img width=\"250\" src=\"https://duckdb.org/images/logo-dl/DuckDB_Logo-stacked.svg\" style=\"max-width: 100%\" alt=\"DuckDB\">\n <h1>\ud83e\udde0 Cognee DuckDB Vector Adapter</h1>\n</div>\n\n<div align=\"center\" style=\"margin-top: 20px;\">\n <span style=\"display: block; margin-bottom: 10px;\">Lightning fast embedded vector search for Cognee using DuckDB with planned graph support</span>\n <br />\n\n[](https://opensource.org/licenses/Apache-2.0)\n\n\n[](https://duckdb.org)\n\n</div>\n\n<div align=\"center\">\n<div display=\"inline-block\">\n <a href=\"https://github.com/topoteretes/cognee\"><b>Cognee</b></a> \n <a href=\"https://duckdb.org/docs/\"><b>DuckDB Docs</b></a> \n <a href=\"#examples\"><b>Examples</b></a> \n <a href=\"#troubleshooting\"><b>Support</b></a>\n </div>\n <br />\n</div>\n\n\n## Features\n\n- **Zero-configuration** embedded vector database - no external server required\n- Full support for vector embeddings storage and retrieval\n- High-performance vector similarity search using DuckDB's native array operations\n- Persistent or in-memory database options\n- **Vector-first design** with planned graph support in future releases\n- Comprehensive error handling and logging\n\n## Installation\n\n```bash\npip install cognee-community-hybrid-adapter-duckdb\n```\n\n## Prerequisites\n\n**None!** DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.\n\n## Examples\nCheckout the `examples/` folder!\n\n**Basic vector search example:**\n```bash\nuv run examples/example.py\n```\n\n**Document processing example with generated story:**\n```bash\nuv run examples/simple_document_example/cognee_simple_document_demo.py\n```\nThis example demonstrates processing a generated story text file (`generated_story.txt`) along with other documents like Alice in Wonderland.\n\n>You will need an OpenAI API key to run the example scripts.\n\n## Usage\n\n```python\nimport os\nimport asyncio\nfrom cognee import config, prune, add, cognify, search, SearchType\n\n# Import the register module to enable DuckDB support\nfrom cognee_community_hybrid_adapter_duckdb import register\n\nasync def main():\n # Configure DuckDB as vector database\n config.set_vector_db_config({\n \"vector_db_provider\": \"duckdb\",\n \"vector_db_url\": \"my_database.db\", # File path or None for in-memory\n })\n \n # Optional: Clean previous data\n await prune.prune_data()\n await prune.prune_system()\n \n # Add your content\n await add(\"\"\"\n Natural language processing (NLP) is an interdisciplinary\n subfield of computer science and information retrieval.\n \"\"\")\n \n # Process with cognee\n await cognify()\n \n # Search (use vector-based search types)\n search_results = await search(\n query_type=SearchType.CHUNKS, \n query_text=\"Tell me about NLP\"\n )\n \n for result in search_results:\n print(\"Search result:\", result)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## Configuration\n\nConfigure DuckDB as your vector database in cognee:\n\n- `vector_db_provider`: Set to \"duckdb\"\n- `vector_db_url`: Database file path (e.g., \"my_db.db\"), `None` for in-memory, or MotherDuck URL for cloud\n\n### Database Options\n\n```python\n# Persistent file-based database\nconfig.set_vector_db_config({\n \"vector_db_provider\": \"duckdb\",\n \"vector_db_url\": \"cognee_vectors.db\"\n})\n\n# In-memory database (fastest, but data is lost on restart)\nconfig.set_vector_db_config({\n \"vector_db_provider\": \"duckdb\",\n \"vector_db_url\": None # or \":memory:\"\n})\n\n# Absolute path to database file\nconfig.set_vector_db_config({\n \"vector_db_provider\": \"duckdb\", \n \"vector_db_url\": \"/path/to/my/database.db\"\n})\n\n# MotherDuck cloud database\nconfig.set_vector_db_config({\n \"vector_db_provider\": \"duckdb\",\n \"vector_db_url\": \"md:my_database\" # Replace with your MotherDuck database\n})\n```\n\n## Requirements\n\n- Python >= 3.12, <= 3.13\n- duckdb >= 1.3.2\n- cognee >= 0.2.3\n\n## Roadmap: Graph Support\n\nThis adapter is currently **vector-focused** with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.\n\n**Current Status:**\n- \u2705 Full vector similarity search\n- \u2705 Embedding storage and retrieval \n- \u2705 Collection management\n- \ud83d\udea7 Graph operations (coming soon)\n\n## Error Handling\n\nThe adapter includes comprehensive error handling:\n\n- `CollectionNotFoundError`: Raised when attempting operations on non-existent collections\n- `InvalidValueError`: Raised for invalid query parameters \n- `NotImplementedError`: Currently raised for graph operations (graph support coming soon)\n- Graceful handling of database connection issues and embedding errors\n\n## Performance\n\nDuckDB provides excellent performance characteristics:\n\n- **Embedded**: No network overhead - everything runs in-process\n- **Columnar**: Optimized storage format for analytical workloads\n- **Vectorized**: SIMD operations for fast vector similarity calculations\n- **ACID**: Full transactional support with data consistency\n- **Memory efficient**: Minimal memory footprint compared to traditional databases\n\n## Troubleshooting\n\n### Common Issues\n\n1. **File Permission Errors**: Ensure write permissions to the directory containing your database file\n2. **Embedding Dimension Mismatch**: Verify embedding dimensions match collection configuration\n3. **Collection Not Found**: Always create collections before adding data points\n4. **Graph Operations**: Graph support is planned for future releases - currently use vector search\n\n### Debug Logging\n\nThe adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:\n\n```python\nimport logging\nlogging.getLogger(\"DuckDBAdapter\").setLevel(logging.DEBUG)\n```\n\n### Database Option Comparison\n\n| Option | Pros | Cons |\n|--------|------|------|\n| File-based (`\"my_db.db\"`) | \u2705 Persistent storage<br/>\u2705 Survives restarts<br/>\u2705 Can handle large datasets | \u274c Slower I/O<br/>\u274c Disk space usage |\n| In-memory (`None`) | \u2705 Maximum performance<br/>\u2705 No disk usage<br/>\u2705 Perfect for testing | \u274c Data lost on restart<br/>\u274c Limited by RAM |\n| MotherDuck (`\"md:database\"`) | \u2705 Cloud-hosted<br/>\u2705 Shared access<br/>\u2705 Managed service<br/>\u2705 Scalable | \u274c Requires internet<br/>\u274c Potential latency<br/>\u274c MotherDuck account needed |\n\n## Development\n\nTo contribute or modify the adapter:\n\n1. Clone the repository and `cd` into the `packages/hybrid/duckdb` folder\n2. Install dependencies: `uv sync --all-extras`\n3. Run tests: `uv run examples/example.py`\n4. Make your changes, test, and submit a PR\n\n## Extensions Used\n\nThis adapter automatically loads these DuckDB extensions:\n- **duckpgq**: Property graph queries (foundation for upcoming graph support)\n- **vss**: Vector similarity search with HNSW indexing support",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "DuckDB vector adapter for Cognee with planned graph support",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://docs.cognee.ai",
"Homepage": "https://github.com/topoteretes/cognee-community",
"Issues": "https://github.com/topoteretes/cognee-community/issues",
"README": "https://github.com/topoteretes/cognee-community/blob/main/packages/hybrid/duckdb/README.md",
"Repository": "https://github.com/topoteretes/cognee-community"
},
"split_keywords": [
"cognee",
" duckdb",
" vector",
" database",
" embeddings",
" ai",
" ml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "859f91099f46f6aca82382add0bf2af6c23bb22b180bb802934d2db03eb8e0b8",
"md5": "4c71d003ab8337aa575b5e92ae973073",
"sha256": "950788f897fd8c2481a3487ef7063ebeb55a95c986b4f96d31e948b3c19e8502"
},
"downloads": -1,
"filename": "cognee_community_hybrid_adapter_duckdb-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4c71d003ab8337aa575b5e92ae973073",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<=3.13,>=3.10",
"size": 10345,
"upload_time": "2025-08-29T16:14:55",
"upload_time_iso_8601": "2025-08-29T16:14:55.851450Z",
"url": "https://files.pythonhosted.org/packages/85/9f/91099f46f6aca82382add0bf2af6c23bb22b180bb802934d2db03eb8e0b8/cognee_community_hybrid_adapter_duckdb-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "93b814132c644806baa1f5397a978afc1b45bc9f62b0a3077a2f50e895762ac1",
"md5": "74c32409887b8fae1a783a871d6e56d3",
"sha256": "c204b995065041b443a46c71c6d129264e7466c7e4d2665d548540f3de3217df"
},
"downloads": -1,
"filename": "cognee_community_hybrid_adapter_duckdb-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "74c32409887b8fae1a783a871d6e56d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<=3.13,>=3.10",
"size": 13029,
"upload_time": "2025-08-29T16:14:57",
"upload_time_iso_8601": "2025-08-29T16:14:57.042285Z",
"url": "https://files.pythonhosted.org/packages/93/b8/14132c644806baa1f5397a978afc1b45bc9f62b0a3077a2f50e895762ac1/cognee_community_hybrid_adapter_duckdb-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 16:14:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "topoteretes",
"github_project": "cognee-community",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cognee-community-hybrid-adapter-duckdb"
}