# Dify Knowledge Base SDK
A comprehensive Python SDK for interacting with Dify's Knowledge Base API. This SDK provides easy-to-use methods for managing datasets (knowledge bases), documents, segments, and metadata through Dify's REST API.
## Features
- 📚 **Complete API Coverage**: Support for all Dify Knowledge Base API endpoints
- 🔐 **Authentication**: Secure API key-based authentication
- 📄 **Document Management**: Create, update, delete documents from text or files
- 🗂️ **Dataset Operations**: Full CRUD operations for knowledge bases
- ✂️ **Segment Control**: Manage document segments (chunks) with fine-grained control
- 🏷️ **Metadata Support**: Create and manage custom metadata fields
- 🌐 **HTTP Client**: Built on httpx for reliable and fast HTTP communications
- ⚠️ **Error Handling**: Comprehensive error handling with custom exceptions
- 📊 **Progress Monitoring**: Track document indexing progress
- 🔒 **Type Safety**: Full type hints with Pydantic models
## Installation
```bash
pip install dify-sdk
```
## Quick Start
```python
from dify_sdk import DifyDatasetClient
# Initialize the client
client = DifyDatasetClient(api_key="your-api-key-here")
# Create a new dataset (knowledge base)
dataset = client.create_dataset(
name="My Knowledge Base",
permission="only_me"
)
# Create a document from text
doc_response = client.create_document_by_text(
dataset_id=dataset.id,
name="Sample Document",
text="This is a sample document for the knowledge base.",
indexing_technique="high_quality"
)
# List all documents
documents = client.list_documents(dataset.id)
print(f"Total documents: {documents.total}")
# Close the client
client.close()
```
## Configuration
### API Key
Get your API key from the Dify knowledge base API page:
1. Go to your Dify knowledge base
2. Navigate to the **API** section in the left sidebar
3. Generate or copy your API key from the **API Keys** section
### Base URL
By default, the SDK uses `https://api.dify.ai` as the base URL. You can customize this:
```python
client = DifyDatasetClient(
api_key="your-api-key",
base_url="https://your-custom-dify-instance.com",
timeout=60.0 # Custom timeout in seconds
)
```
## Core Features
### Dataset Management
```python
# Create a dataset
dataset = client.create_dataset(
name="Technical Documentation",
permission="only_me",
description="Internal technical docs"
)
# List datasets with pagination
datasets = client.list_datasets(page=1, limit=20)
# Delete a dataset
client.delete_dataset(dataset_id)
```
### Document Operations
#### From Text
```python
# Create document from text
doc_response = client.create_document_by_text(
dataset_id=dataset_id,
name="API Documentation",
text="Complete API documentation content...",
indexing_technique="high_quality",
process_rule_mode="automatic"
)
```
#### From File
```python
# Create document from file
doc_response = client.create_document_by_file(
dataset_id=dataset_id,
file_path="./documentation.pdf",
indexing_technique="high_quality"
)
```
#### Custom Processing Rules
```python
# Custom processing configuration
process_rule_config = {
"rules": {
"pre_processing_rules": [
{"id": "remove_extra_spaces", "enabled": True},
{"id": "remove_urls_emails", "enabled": True}
],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
}
}
doc_response = client.create_document_by_file(
dataset_id=dataset_id,
file_path="document.txt",
process_rule_mode="custom",
process_rule_config=process_rule_config
)
```
### Segment Management
```python
# Create segments
segments_data = [
{
"content": "First segment content",
"answer": "Answer for first segment",
"keywords": ["keyword1", "keyword2"]
},
{
"content": "Second segment content",
"answer": "Answer for second segment",
"keywords": ["keyword3", "keyword4"]
}
]
segments = client.create_segments(dataset_id, document_id, segments_data)
# List segments
segments = client.list_segments(dataset_id, document_id)
# Update a segment
client.update_segment(
dataset_id=dataset_id,
document_id=document_id,
segment_id=segment_id,
segment_data={
"content": "Updated content",
"keywords": ["updated", "keywords"],
"enabled": True
}
)
# Delete a segment
client.delete_segment(dataset_id, document_id, segment_id)
```
### Metadata Management
```python
# Create metadata fields
category_field = client.create_metadata_field(
dataset_id=dataset_id,
field_type="string",
name="category"
)
priority_field = client.create_metadata_field(
dataset_id=dataset_id,
field_type="number",
name="priority"
)
# Update document metadata
metadata_operations = [
{
"document_id": document_id,
"metadata_list": [
{
"id": category_field.id,
"value": "technical",
"name": "category"
},
{
"id": priority_field.id,
"value": "5",
"name": "priority"
}
]
}
]
client.update_document_metadata(dataset_id, metadata_operations)
```
### Progress Monitoring
```python
# Monitor document indexing progress
status = client.get_document_indexing_status(dataset_id, batch_id)
if status.data:
indexing_info = status.data[0]
print(f"Status: {indexing_info.indexing_status}")
print(f"Progress: {indexing_info.completed_segments}/{indexing_info.total_segments}")
```
## Error Handling
The SDK provides comprehensive error handling with specific exception types:
```python
from dify_sdk.exceptions import (
DifyAPIError,
DifyAuthenticationError,
DifyValidationError,
DifyNotFoundError,
DifyConflictError,
DifyServerError,
DifyConnectionError,
DifyTimeoutError
)
try:
dataset = client.create_dataset(name="Test Dataset")
except DifyAuthenticationError:
print("Invalid API key")
except DifyValidationError as e:
print(f"Validation error: {e}")
except DifyConflictError as e:
print(f"Conflict: {e}") # e.g., duplicate dataset name
except DifyAPIError as e:
print(f"API error: {e}")
print(f"Status code: {e.status_code}")
print(f"Error code: {e.error_code}")
```
## Advanced Usage
For more advanced scenarios, see the [examples](./examples/) directory:
- [Basic Usage](./examples/basic_usage.py) - Simple operations and getting started
- [Advanced Usage](./examples/advanced_usage.py) - Complex workflows, batch operations, and monitoring
## API Reference
### Client Configuration
```python
DifyDatasetClient(
api_key: str, # Required: Your Dify API key
base_url: str, # Optional: API base URL (default: "https://api.dify.ai")
timeout: float # Optional: Request timeout in seconds (default: 30.0)
)
```
### Supported File Types
The SDK supports uploading the following file types:
- `txt` - Plain text files
- `md`, `markdown` - Markdown files
- `pdf` - PDF documents
- `html` - HTML files
- `xlsx` - Excel spreadsheets
- `docx` - Word documents
- `csv` - CSV files
### Rate Limits
Please respect Dify's API rate limits. The SDK includes automatic error handling for rate limit responses.
## Development
### Setup
```bash
# Clone the repository
git clone https://github.com/dify/dify-sdk-python.git
cd dify-sdk-python
# Install dependencies
pip install -e ".[dev]"
```
### Running Tests
```bash
pytest
```
### Code Formatting
```bash
black dify_sdk/
isort dify_sdk/
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Support
- 📖 [Dify Documentation](https://docs.dify.ai/)
- 🐛 [Issue Tracker](https://github.com/dify/dify-sdk-python/issues)
- 💬 [Community Discussions](https://github.com/dify/dify/discussions)
## Changelog
### v0.1.0
- Initial release
- Full Dify Knowledge Base API support
- Complete CRUD operations for datasets, documents, segments, and metadata
- Comprehensive error handling
- Type-safe models with Pydantic
- File upload support
- Progress monitoring
- Examples and documentation
Raw data
{
"_id": null,
"home_page": null,
"name": "dify-knowledge-sdk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8.1",
"maintainer_email": null,
"keywords": "api, dify, knowledge-base, sdk",
"author": "Dify SDK Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/be/5c/7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7/dify_knowledge_sdk-0.1.0.tar.gz",
"platform": null,
"description": "# Dify Knowledge Base SDK\n\nA comprehensive Python SDK for interacting with Dify's Knowledge Base API. This SDK provides easy-to-use methods for managing datasets (knowledge bases), documents, segments, and metadata through Dify's REST API.\n\n## Features\n\n- \ud83d\udcda **Complete API Coverage**: Support for all Dify Knowledge Base API endpoints\n- \ud83d\udd10 **Authentication**: Secure API key-based authentication\n- \ud83d\udcc4 **Document Management**: Create, update, delete documents from text or files\n- \ud83d\uddc2\ufe0f **Dataset Operations**: Full CRUD operations for knowledge bases\n- \u2702\ufe0f **Segment Control**: Manage document segments (chunks) with fine-grained control\n- \ud83c\udff7\ufe0f **Metadata Support**: Create and manage custom metadata fields\n- \ud83c\udf10 **HTTP Client**: Built on httpx for reliable and fast HTTP communications\n- \u26a0\ufe0f **Error Handling**: Comprehensive error handling with custom exceptions\n- \ud83d\udcca **Progress Monitoring**: Track document indexing progress\n- \ud83d\udd12 **Type Safety**: Full type hints with Pydantic models\n\n## Installation\n\n```bash\npip install dify-sdk\n```\n\n## Quick Start\n\n```python\nfrom dify_sdk import DifyDatasetClient\n\n# Initialize the client\nclient = DifyDatasetClient(api_key=\"your-api-key-here\")\n\n# Create a new dataset (knowledge base)\ndataset = client.create_dataset(\n name=\"My Knowledge Base\",\n permission=\"only_me\"\n)\n\n# Create a document from text\ndoc_response = client.create_document_by_text(\n dataset_id=dataset.id,\n name=\"Sample Document\",\n text=\"This is a sample document for the knowledge base.\",\n indexing_technique=\"high_quality\"\n)\n\n# List all documents\ndocuments = client.list_documents(dataset.id)\nprint(f\"Total documents: {documents.total}\")\n\n# Close the client\nclient.close()\n```\n\n## Configuration\n\n### API Key\n\nGet your API key from the Dify knowledge base API page:\n\n1. Go to your Dify knowledge base\n2. Navigate to the **API** section in the left sidebar\n3. Generate or copy your API key from the **API Keys** section\n\n### Base URL\n\nBy default, the SDK uses `https://api.dify.ai` as the base URL. You can customize this:\n\n```python\nclient = DifyDatasetClient(\n api_key=\"your-api-key\",\n base_url=\"https://your-custom-dify-instance.com\",\n timeout=60.0 # Custom timeout in seconds\n)\n```\n\n## Core Features\n\n### Dataset Management\n\n```python\n# Create a dataset\ndataset = client.create_dataset(\n name=\"Technical Documentation\",\n permission=\"only_me\",\n description=\"Internal technical docs\"\n)\n\n# List datasets with pagination\ndatasets = client.list_datasets(page=1, limit=20)\n\n# Delete a dataset\nclient.delete_dataset(dataset_id)\n```\n\n### Document Operations\n\n#### From Text\n\n```python\n# Create document from text\ndoc_response = client.create_document_by_text(\n dataset_id=dataset_id,\n name=\"API Documentation\",\n text=\"Complete API documentation content...\",\n indexing_technique=\"high_quality\",\n process_rule_mode=\"automatic\"\n)\n```\n\n#### From File\n\n```python\n# Create document from file\ndoc_response = client.create_document_by_file(\n dataset_id=dataset_id,\n file_path=\"./documentation.pdf\",\n indexing_technique=\"high_quality\"\n)\n```\n\n#### Custom Processing Rules\n\n```python\n# Custom processing configuration\nprocess_rule_config = {\n \"rules\": {\n \"pre_processing_rules\": [\n {\"id\": \"remove_extra_spaces\", \"enabled\": True},\n {\"id\": \"remove_urls_emails\", \"enabled\": True}\n ],\n \"segmentation\": {\n \"separator\": \"###\",\n \"max_tokens\": 500\n }\n }\n}\n\ndoc_response = client.create_document_by_file(\n dataset_id=dataset_id,\n file_path=\"document.txt\",\n process_rule_mode=\"custom\",\n process_rule_config=process_rule_config\n)\n```\n\n### Segment Management\n\n```python\n# Create segments\nsegments_data = [\n {\n \"content\": \"First segment content\",\n \"answer\": \"Answer for first segment\",\n \"keywords\": [\"keyword1\", \"keyword2\"]\n },\n {\n \"content\": \"Second segment content\",\n \"answer\": \"Answer for second segment\",\n \"keywords\": [\"keyword3\", \"keyword4\"]\n }\n]\n\nsegments = client.create_segments(dataset_id, document_id, segments_data)\n\n# List segments\nsegments = client.list_segments(dataset_id, document_id)\n\n# Update a segment\nclient.update_segment(\n dataset_id=dataset_id,\n document_id=document_id,\n segment_id=segment_id,\n segment_data={\n \"content\": \"Updated content\",\n \"keywords\": [\"updated\", \"keywords\"],\n \"enabled\": True\n }\n)\n\n# Delete a segment\nclient.delete_segment(dataset_id, document_id, segment_id)\n```\n\n### Metadata Management\n\n```python\n# Create metadata fields\ncategory_field = client.create_metadata_field(\n dataset_id=dataset_id,\n field_type=\"string\",\n name=\"category\"\n)\n\npriority_field = client.create_metadata_field(\n dataset_id=dataset_id,\n field_type=\"number\",\n name=\"priority\"\n)\n\n# Update document metadata\nmetadata_operations = [\n {\n \"document_id\": document_id,\n \"metadata_list\": [\n {\n \"id\": category_field.id,\n \"value\": \"technical\",\n \"name\": \"category\"\n },\n {\n \"id\": priority_field.id,\n \"value\": \"5\",\n \"name\": \"priority\"\n }\n ]\n }\n]\n\nclient.update_document_metadata(dataset_id, metadata_operations)\n```\n\n### Progress Monitoring\n\n```python\n# Monitor document indexing progress\nstatus = client.get_document_indexing_status(dataset_id, batch_id)\n\nif status.data:\n indexing_info = status.data[0]\n print(f\"Status: {indexing_info.indexing_status}\")\n print(f\"Progress: {indexing_info.completed_segments}/{indexing_info.total_segments}\")\n```\n\n## Error Handling\n\nThe SDK provides comprehensive error handling with specific exception types:\n\n```python\nfrom dify_sdk.exceptions import (\n DifyAPIError,\n DifyAuthenticationError,\n DifyValidationError,\n DifyNotFoundError,\n DifyConflictError,\n DifyServerError,\n DifyConnectionError,\n DifyTimeoutError\n)\n\ntry:\n dataset = client.create_dataset(name=\"Test Dataset\")\nexcept DifyAuthenticationError:\n print(\"Invalid API key\")\nexcept DifyValidationError as e:\n print(f\"Validation error: {e}\")\nexcept DifyConflictError as e:\n print(f\"Conflict: {e}\") # e.g., duplicate dataset name\nexcept DifyAPIError as e:\n print(f\"API error: {e}\")\n print(f\"Status code: {e.status_code}\")\n print(f\"Error code: {e.error_code}\")\n```\n\n## Advanced Usage\n\nFor more advanced scenarios, see the [examples](./examples/) directory:\n\n- [Basic Usage](./examples/basic_usage.py) - Simple operations and getting started\n- [Advanced Usage](./examples/advanced_usage.py) - Complex workflows, batch operations, and monitoring\n\n## API Reference\n\n### Client Configuration\n\n```python\nDifyDatasetClient(\n api_key: str, # Required: Your Dify API key\n base_url: str, # Optional: API base URL (default: \"https://api.dify.ai\")\n timeout: float # Optional: Request timeout in seconds (default: 30.0)\n)\n```\n\n### Supported File Types\n\nThe SDK supports uploading the following file types:\n\n- `txt` - Plain text files\n- `md`, `markdown` - Markdown files\n- `pdf` - PDF documents\n- `html` - HTML files\n- `xlsx` - Excel spreadsheets\n- `docx` - Word documents\n- `csv` - CSV files\n\n### Rate Limits\n\nPlease respect Dify's API rate limits. The SDK includes automatic error handling for rate limit responses.\n\n## Development\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/dify/dify-sdk-python.git\ncd dify-sdk-python\n\n# Install dependencies\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack dify_sdk/\nisort dify_sdk/\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- \ud83d\udcd6 [Dify Documentation](https://docs.dify.ai/)\n- \ud83d\udc1b [Issue Tracker](https://github.com/dify/dify-sdk-python/issues)\n- \ud83d\udcac [Community Discussions](https://github.com/dify/dify/discussions)\n\n## Changelog\n\n### v0.1.0\n\n- Initial release\n- Full Dify Knowledge Base API support\n- Complete CRUD operations for datasets, documents, segments, and metadata\n- Comprehensive error handling\n- Type-safe models with Pydantic\n- File upload support\n- Progress monitoring\n- Examples and documentation\n",
"bugtrack_url": null,
"license": null,
"summary": "Python SDK for Dify Knowledge Base API",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://docs.dify.ai/",
"Homepage": "https://github.com/LeekJay/dify-knowledge-sdk",
"Issues": "https://github.com/LeekJay/dify-knowledge-sdk/issues",
"Repository": "https://github.com/LeekJay/dify-knowledge-sdk"
},
"split_keywords": [
"api",
" dify",
" knowledge-base",
" sdk"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "15d962b42e0d40f7c01ec3a367d6b54c4d2d9827625f002d4946ec8a9a3da85d",
"md5": "abe3564a52e04c482dcf1bf5d6076419",
"sha256": "14cd1272115bda53dee5ff271c371b1034a0cf49b4385ec5ff2e5f9ad3839e37"
},
"downloads": -1,
"filename": "dify_knowledge_sdk-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "abe3564a52e04c482dcf1bf5d6076419",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1",
"size": 10993,
"upload_time": "2025-08-03T12:54:23",
"upload_time_iso_8601": "2025-08-03T12:54:23.485851Z",
"url": "https://files.pythonhosted.org/packages/15/d9/62b42e0d40f7c01ec3a367d6b54c4d2d9827625f002d4946ec8a9a3da85d/dify_knowledge_sdk-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "be5c7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7",
"md5": "2d7ed8d4b9acb7475df7c89281fef1d9",
"sha256": "650067bd36087d8db2f91a1f05e349d5219fec6785d96c724e86cbc9ed17bfeb"
},
"downloads": -1,
"filename": "dify_knowledge_sdk-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "2d7ed8d4b9acb7475df7c89281fef1d9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1",
"size": 82079,
"upload_time": "2025-08-03T12:54:25",
"upload_time_iso_8601": "2025-08-03T12:54:25.638068Z",
"url": "https://files.pythonhosted.org/packages/be/5c/7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7/dify_knowledge_sdk-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-03 12:54:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LeekJay",
"github_project": "dify-knowledge-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dify-knowledge-sdk"
}