dify-knowledge-sdk


Namedify-knowledge-sdk JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryPython SDK for Dify Knowledge Base API
upload_time2025-08-03 12:54:25
maintainerNone
docs_urlNone
authorDify SDK Team
requires_python>=3.8.1
licenseNone
keywords api dify knowledge-base sdk
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dify Knowledge Base SDK

A comprehensive Python SDK for interacting with Dify's Knowledge Base API. This SDK provides easy-to-use methods for managing datasets (knowledge bases), documents, segments, and metadata through Dify's REST API.

## Features

- 📚 **Complete API Coverage**: Support for all Dify Knowledge Base API endpoints
- 🔐 **Authentication**: Secure API key-based authentication
- 📄 **Document Management**: Create, update, delete documents from text or files
- 🗂️ **Dataset Operations**: Full CRUD operations for knowledge bases
- ✂️ **Segment Control**: Manage document segments (chunks) with fine-grained control
- 🏷️ **Metadata Support**: Create and manage custom metadata fields
- 🌐 **HTTP Client**: Built on httpx for reliable and fast HTTP communications
- ⚠️ **Error Handling**: Comprehensive error handling with custom exceptions
- 📊 **Progress Monitoring**: Track document indexing progress
- 🔒 **Type Safety**: Full type hints with Pydantic models

## Installation

```bash
pip install dify-sdk
```

## Quick Start

```python
from dify_sdk import DifyDatasetClient

# Initialize the client
client = DifyDatasetClient(api_key="your-api-key-here")

# Create a new dataset (knowledge base)
dataset = client.create_dataset(
    name="My Knowledge Base",
    permission="only_me"
)

# Create a document from text
doc_response = client.create_document_by_text(
    dataset_id=dataset.id,
    name="Sample Document",
    text="This is a sample document for the knowledge base.",
    indexing_technique="high_quality"
)

# List all documents
documents = client.list_documents(dataset.id)
print(f"Total documents: {documents.total}")

# Close the client
client.close()
```

## Configuration

### API Key

Get your API key from the Dify knowledge base API page:

1. Go to your Dify knowledge base
2. Navigate to the **API** section in the left sidebar
3. Generate or copy your API key from the **API Keys** section

### Base URL

By default, the SDK uses `https://api.dify.ai` as the base URL. You can customize this:

```python
client = DifyDatasetClient(
    api_key="your-api-key",
    base_url="https://your-custom-dify-instance.com",
    timeout=60.0  # Custom timeout in seconds
)
```

## Core Features

### Dataset Management

```python
# Create a dataset
dataset = client.create_dataset(
    name="Technical Documentation",
    permission="only_me",
    description="Internal technical docs"
)

# List datasets with pagination
datasets = client.list_datasets(page=1, limit=20)

# Delete a dataset
client.delete_dataset(dataset_id)
```

### Document Operations

#### From Text

```python
# Create document from text
doc_response = client.create_document_by_text(
    dataset_id=dataset_id,
    name="API Documentation",
    text="Complete API documentation content...",
    indexing_technique="high_quality",
    process_rule_mode="automatic"
)
```

#### From File

```python
# Create document from file
doc_response = client.create_document_by_file(
    dataset_id=dataset_id,
    file_path="./documentation.pdf",
    indexing_technique="high_quality"
)
```

#### Custom Processing Rules

```python
# Custom processing configuration
process_rule_config = {
    "rules": {
        "pre_processing_rules": [
            {"id": "remove_extra_spaces", "enabled": True},
            {"id": "remove_urls_emails", "enabled": True}
        ],
        "segmentation": {
            "separator": "###",
            "max_tokens": 500
        }
    }
}

doc_response = client.create_document_by_file(
    dataset_id=dataset_id,
    file_path="document.txt",
    process_rule_mode="custom",
    process_rule_config=process_rule_config
)
```

### Segment Management

```python
# Create segments
segments_data = [
    {
        "content": "First segment content",
        "answer": "Answer for first segment",
        "keywords": ["keyword1", "keyword2"]
    },
    {
        "content": "Second segment content",
        "answer": "Answer for second segment",
        "keywords": ["keyword3", "keyword4"]
    }
]

segments = client.create_segments(dataset_id, document_id, segments_data)

# List segments
segments = client.list_segments(dataset_id, document_id)

# Update a segment
client.update_segment(
    dataset_id=dataset_id,
    document_id=document_id,
    segment_id=segment_id,
    segment_data={
        "content": "Updated content",
        "keywords": ["updated", "keywords"],
        "enabled": True
    }
)

# Delete a segment
client.delete_segment(dataset_id, document_id, segment_id)
```

### Metadata Management

```python
# Create metadata fields
category_field = client.create_metadata_field(
    dataset_id=dataset_id,
    field_type="string",
    name="category"
)

priority_field = client.create_metadata_field(
    dataset_id=dataset_id,
    field_type="number",
    name="priority"
)

# Update document metadata
metadata_operations = [
    {
        "document_id": document_id,
        "metadata_list": [
            {
                "id": category_field.id,
                "value": "technical",
                "name": "category"
            },
            {
                "id": priority_field.id,
                "value": "5",
                "name": "priority"
            }
        ]
    }
]

client.update_document_metadata(dataset_id, metadata_operations)
```

### Progress Monitoring

```python
# Monitor document indexing progress
status = client.get_document_indexing_status(dataset_id, batch_id)

if status.data:
    indexing_info = status.data[0]
    print(f"Status: {indexing_info.indexing_status}")
    print(f"Progress: {indexing_info.completed_segments}/{indexing_info.total_segments}")
```

## Error Handling

The SDK provides comprehensive error handling with specific exception types:

```python
from dify_sdk.exceptions import (
    DifyAPIError,
    DifyAuthenticationError,
    DifyValidationError,
    DifyNotFoundError,
    DifyConflictError,
    DifyServerError,
    DifyConnectionError,
    DifyTimeoutError
)

try:
    dataset = client.create_dataset(name="Test Dataset")
except DifyAuthenticationError:
    print("Invalid API key")
except DifyValidationError as e:
    print(f"Validation error: {e}")
except DifyConflictError as e:
    print(f"Conflict: {e}")  # e.g., duplicate dataset name
except DifyAPIError as e:
    print(f"API error: {e}")
    print(f"Status code: {e.status_code}")
    print(f"Error code: {e.error_code}")
```

## Advanced Usage

For more advanced scenarios, see the [examples](./examples/) directory:

- [Basic Usage](./examples/basic_usage.py) - Simple operations and getting started
- [Advanced Usage](./examples/advanced_usage.py) - Complex workflows, batch operations, and monitoring

## API Reference

### Client Configuration

```python
DifyDatasetClient(
    api_key: str,           # Required: Your Dify API key
    base_url: str,          # Optional: API base URL (default: "https://api.dify.ai")
    timeout: float          # Optional: Request timeout in seconds (default: 30.0)
)
```

### Supported File Types

The SDK supports uploading the following file types:

- `txt` - Plain text files
- `md`, `markdown` - Markdown files
- `pdf` - PDF documents
- `html` - HTML files
- `xlsx` - Excel spreadsheets
- `docx` - Word documents
- `csv` - CSV files

### Rate Limits

Please respect Dify's API rate limits. The SDK includes automatic error handling for rate limit responses.

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/dify/dify-sdk-python.git
cd dify-sdk-python

# Install dependencies
pip install -e ".[dev]"
```

### Running Tests

```bash
pytest
```

### Code Formatting

```bash
black dify_sdk/
isort dify_sdk/
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

- 📖 [Dify Documentation](https://docs.dify.ai/)
- 🐛 [Issue Tracker](https://github.com/dify/dify-sdk-python/issues)
- 💬 [Community Discussions](https://github.com/dify/dify/discussions)

## Changelog

### v0.1.0

- Initial release
- Full Dify Knowledge Base API support
- Complete CRUD operations for datasets, documents, segments, and metadata
- Comprehensive error handling
- Type-safe models with Pydantic
- File upload support
- Progress monitoring
- Examples and documentation

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dify-knowledge-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8.1",
    "maintainer_email": null,
    "keywords": "api, dify, knowledge-base, sdk",
    "author": "Dify SDK Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/be/5c/7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7/dify_knowledge_sdk-0.1.0.tar.gz",
    "platform": null,
    "description": "# Dify Knowledge Base SDK\n\nA comprehensive Python SDK for interacting with Dify's Knowledge Base API. This SDK provides easy-to-use methods for managing datasets (knowledge bases), documents, segments, and metadata through Dify's REST API.\n\n## Features\n\n- \ud83d\udcda **Complete API Coverage**: Support for all Dify Knowledge Base API endpoints\n- \ud83d\udd10 **Authentication**: Secure API key-based authentication\n- \ud83d\udcc4 **Document Management**: Create, update, delete documents from text or files\n- \ud83d\uddc2\ufe0f **Dataset Operations**: Full CRUD operations for knowledge bases\n- \u2702\ufe0f **Segment Control**: Manage document segments (chunks) with fine-grained control\n- \ud83c\udff7\ufe0f **Metadata Support**: Create and manage custom metadata fields\n- \ud83c\udf10 **HTTP Client**: Built on httpx for reliable and fast HTTP communications\n- \u26a0\ufe0f **Error Handling**: Comprehensive error handling with custom exceptions\n- \ud83d\udcca **Progress Monitoring**: Track document indexing progress\n- \ud83d\udd12 **Type Safety**: Full type hints with Pydantic models\n\n## Installation\n\n```bash\npip install dify-sdk\n```\n\n## Quick Start\n\n```python\nfrom dify_sdk import DifyDatasetClient\n\n# Initialize the client\nclient = DifyDatasetClient(api_key=\"your-api-key-here\")\n\n# Create a new dataset (knowledge base)\ndataset = client.create_dataset(\n    name=\"My Knowledge Base\",\n    permission=\"only_me\"\n)\n\n# Create a document from text\ndoc_response = client.create_document_by_text(\n    dataset_id=dataset.id,\n    name=\"Sample Document\",\n    text=\"This is a sample document for the knowledge base.\",\n    indexing_technique=\"high_quality\"\n)\n\n# List all documents\ndocuments = client.list_documents(dataset.id)\nprint(f\"Total documents: {documents.total}\")\n\n# Close the client\nclient.close()\n```\n\n## Configuration\n\n### API Key\n\nGet your API key from the Dify knowledge base API page:\n\n1. Go to your Dify knowledge base\n2. Navigate to the **API** section in the left sidebar\n3. Generate or copy your API key from the **API Keys** section\n\n### Base URL\n\nBy default, the SDK uses `https://api.dify.ai` as the base URL. You can customize this:\n\n```python\nclient = DifyDatasetClient(\n    api_key=\"your-api-key\",\n    base_url=\"https://your-custom-dify-instance.com\",\n    timeout=60.0  # Custom timeout in seconds\n)\n```\n\n## Core Features\n\n### Dataset Management\n\n```python\n# Create a dataset\ndataset = client.create_dataset(\n    name=\"Technical Documentation\",\n    permission=\"only_me\",\n    description=\"Internal technical docs\"\n)\n\n# List datasets with pagination\ndatasets = client.list_datasets(page=1, limit=20)\n\n# Delete a dataset\nclient.delete_dataset(dataset_id)\n```\n\n### Document Operations\n\n#### From Text\n\n```python\n# Create document from text\ndoc_response = client.create_document_by_text(\n    dataset_id=dataset_id,\n    name=\"API Documentation\",\n    text=\"Complete API documentation content...\",\n    indexing_technique=\"high_quality\",\n    process_rule_mode=\"automatic\"\n)\n```\n\n#### From File\n\n```python\n# Create document from file\ndoc_response = client.create_document_by_file(\n    dataset_id=dataset_id,\n    file_path=\"./documentation.pdf\",\n    indexing_technique=\"high_quality\"\n)\n```\n\n#### Custom Processing Rules\n\n```python\n# Custom processing configuration\nprocess_rule_config = {\n    \"rules\": {\n        \"pre_processing_rules\": [\n            {\"id\": \"remove_extra_spaces\", \"enabled\": True},\n            {\"id\": \"remove_urls_emails\", \"enabled\": True}\n        ],\n        \"segmentation\": {\n            \"separator\": \"###\",\n            \"max_tokens\": 500\n        }\n    }\n}\n\ndoc_response = client.create_document_by_file(\n    dataset_id=dataset_id,\n    file_path=\"document.txt\",\n    process_rule_mode=\"custom\",\n    process_rule_config=process_rule_config\n)\n```\n\n### Segment Management\n\n```python\n# Create segments\nsegments_data = [\n    {\n        \"content\": \"First segment content\",\n        \"answer\": \"Answer for first segment\",\n        \"keywords\": [\"keyword1\", \"keyword2\"]\n    },\n    {\n        \"content\": \"Second segment content\",\n        \"answer\": \"Answer for second segment\",\n        \"keywords\": [\"keyword3\", \"keyword4\"]\n    }\n]\n\nsegments = client.create_segments(dataset_id, document_id, segments_data)\n\n# List segments\nsegments = client.list_segments(dataset_id, document_id)\n\n# Update a segment\nclient.update_segment(\n    dataset_id=dataset_id,\n    document_id=document_id,\n    segment_id=segment_id,\n    segment_data={\n        \"content\": \"Updated content\",\n        \"keywords\": [\"updated\", \"keywords\"],\n        \"enabled\": True\n    }\n)\n\n# Delete a segment\nclient.delete_segment(dataset_id, document_id, segment_id)\n```\n\n### Metadata Management\n\n```python\n# Create metadata fields\ncategory_field = client.create_metadata_field(\n    dataset_id=dataset_id,\n    field_type=\"string\",\n    name=\"category\"\n)\n\npriority_field = client.create_metadata_field(\n    dataset_id=dataset_id,\n    field_type=\"number\",\n    name=\"priority\"\n)\n\n# Update document metadata\nmetadata_operations = [\n    {\n        \"document_id\": document_id,\n        \"metadata_list\": [\n            {\n                \"id\": category_field.id,\n                \"value\": \"technical\",\n                \"name\": \"category\"\n            },\n            {\n                \"id\": priority_field.id,\n                \"value\": \"5\",\n                \"name\": \"priority\"\n            }\n        ]\n    }\n]\n\nclient.update_document_metadata(dataset_id, metadata_operations)\n```\n\n### Progress Monitoring\n\n```python\n# Monitor document indexing progress\nstatus = client.get_document_indexing_status(dataset_id, batch_id)\n\nif status.data:\n    indexing_info = status.data[0]\n    print(f\"Status: {indexing_info.indexing_status}\")\n    print(f\"Progress: {indexing_info.completed_segments}/{indexing_info.total_segments}\")\n```\n\n## Error Handling\n\nThe SDK provides comprehensive error handling with specific exception types:\n\n```python\nfrom dify_sdk.exceptions import (\n    DifyAPIError,\n    DifyAuthenticationError,\n    DifyValidationError,\n    DifyNotFoundError,\n    DifyConflictError,\n    DifyServerError,\n    DifyConnectionError,\n    DifyTimeoutError\n)\n\ntry:\n    dataset = client.create_dataset(name=\"Test Dataset\")\nexcept DifyAuthenticationError:\n    print(\"Invalid API key\")\nexcept DifyValidationError as e:\n    print(f\"Validation error: {e}\")\nexcept DifyConflictError as e:\n    print(f\"Conflict: {e}\")  # e.g., duplicate dataset name\nexcept DifyAPIError as e:\n    print(f\"API error: {e}\")\n    print(f\"Status code: {e.status_code}\")\n    print(f\"Error code: {e.error_code}\")\n```\n\n## Advanced Usage\n\nFor more advanced scenarios, see the [examples](./examples/) directory:\n\n- [Basic Usage](./examples/basic_usage.py) - Simple operations and getting started\n- [Advanced Usage](./examples/advanced_usage.py) - Complex workflows, batch operations, and monitoring\n\n## API Reference\n\n### Client Configuration\n\n```python\nDifyDatasetClient(\n    api_key: str,           # Required: Your Dify API key\n    base_url: str,          # Optional: API base URL (default: \"https://api.dify.ai\")\n    timeout: float          # Optional: Request timeout in seconds (default: 30.0)\n)\n```\n\n### Supported File Types\n\nThe SDK supports uploading the following file types:\n\n- `txt` - Plain text files\n- `md`, `markdown` - Markdown files\n- `pdf` - PDF documents\n- `html` - HTML files\n- `xlsx` - Excel spreadsheets\n- `docx` - Word documents\n- `csv` - CSV files\n\n### Rate Limits\n\nPlease respect Dify's API rate limits. The SDK includes automatic error handling for rate limit responses.\n\n## Development\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/dify/dify-sdk-python.git\ncd dify-sdk-python\n\n# Install dependencies\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack dify_sdk/\nisort dify_sdk/\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- \ud83d\udcd6 [Dify Documentation](https://docs.dify.ai/)\n- \ud83d\udc1b [Issue Tracker](https://github.com/dify/dify-sdk-python/issues)\n- \ud83d\udcac [Community Discussions](https://github.com/dify/dify/discussions)\n\n## Changelog\n\n### v0.1.0\n\n- Initial release\n- Full Dify Knowledge Base API support\n- Complete CRUD operations for datasets, documents, segments, and metadata\n- Comprehensive error handling\n- Type-safe models with Pydantic\n- File upload support\n- Progress monitoring\n- Examples and documentation\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python SDK for Dify Knowledge Base API",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://docs.dify.ai/",
        "Homepage": "https://github.com/LeekJay/dify-knowledge-sdk",
        "Issues": "https://github.com/LeekJay/dify-knowledge-sdk/issues",
        "Repository": "https://github.com/LeekJay/dify-knowledge-sdk"
    },
    "split_keywords": [
        "api",
        " dify",
        " knowledge-base",
        " sdk"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "15d962b42e0d40f7c01ec3a367d6b54c4d2d9827625f002d4946ec8a9a3da85d",
                "md5": "abe3564a52e04c482dcf1bf5d6076419",
                "sha256": "14cd1272115bda53dee5ff271c371b1034a0cf49b4385ec5ff2e5f9ad3839e37"
            },
            "downloads": -1,
            "filename": "dify_knowledge_sdk-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "abe3564a52e04c482dcf1bf5d6076419",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1",
            "size": 10993,
            "upload_time": "2025-08-03T12:54:23",
            "upload_time_iso_8601": "2025-08-03T12:54:23.485851Z",
            "url": "https://files.pythonhosted.org/packages/15/d9/62b42e0d40f7c01ec3a367d6b54c4d2d9827625f002d4946ec8a9a3da85d/dify_knowledge_sdk-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "be5c7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7",
                "md5": "2d7ed8d4b9acb7475df7c89281fef1d9",
                "sha256": "650067bd36087d8db2f91a1f05e349d5219fec6785d96c724e86cbc9ed17bfeb"
            },
            "downloads": -1,
            "filename": "dify_knowledge_sdk-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2d7ed8d4b9acb7475df7c89281fef1d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1",
            "size": 82079,
            "upload_time": "2025-08-03T12:54:25",
            "upload_time_iso_8601": "2025-08-03T12:54:25.638068Z",
            "url": "https://files.pythonhosted.org/packages/be/5c/7f519bbcdd5f92506a83bde179cf4e50b94d2b2e4ceee790b412bbbbdef7/dify_knowledge_sdk-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-03 12:54:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LeekJay",
    "github_project": "dify-knowledge-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "dify-knowledge-sdk"
}
        
Elapsed time: 0.94560s