<p align="center">
<img src="https://raw.githubusercontent.com/firstbatchxyz/.github/refs/heads/master/branding/dria-logo-square.svg" alt="logo" width="168">
</p>
<h1 align="center">Dria SDK</h1>
<p align="center">
<a href="https://pypi.org/project/dria/"><img src="https://badge.fury.io/py/dria.svg" alt="PyPI version"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
<a href="https://docs.dria.co"><img src="https://img.shields.io/badge/docs-online-brightgreen.svg" alt="Documentation Status"></a>
<a href="https://discord.gg/dria" target="_blank">
<img alt="Discord" src="https://dcbadge.vercel.app/api/server/dria?style=flat">
</a>
</p>
**Dria SDK** is a scalable and versatile toolkit for creating and managing synthetic datasets for AI. With Dria, you can orchestrate multi-step pipelines that pull data from both web and siloed sources, blend them with powerful AI model outputs, and produce high-quality synthetic datasets—**no GPU required**.
---
## Why Dria?
- **Dataset Generation**: Easily build synthetic data pipelines using Dria’s flexible APIs.
- **Multi-Agent Network**: Orchestrate complex tasks and data retrieval using specialized agents for web search and siloed APIs.
- **No GPUs Needed**: Offload your compute to the network, accelerating your workflows without personal GPU hardware.
- **Customizable**: Define custom Pydantic schemas to shape the output of your datasets precisely.
- **Model-Rich**: Use different Large Language Models (LLMs) such as OpenAI, Gemini, Ollama or others to synthesize data.
- **Grounding & Diversity**: Add real-world context to your synthetic datasets with integrated web and siloed data retrieval.
---
## Installation
Dria SDK is available on PyPI. You can install it with:
```bash
pip install dria
```
It’s recommended to use a virtual environment (e.g., `virtualenv` or `conda`) to avoid version conflicts with other packages.
---
## Quick Start
Here’s a minimal example to get you started with Dria:
```python
import asyncio
from dria import Prompt, DatasetGenerator, DriaDataset, Model
from pydantic import BaseModel, Field
# 1. Define schema
class Tweet(BaseModel):
topic: str = Field(..., title="Topic")
tweet: str = Field(..., title="Tweet")
# 2. Create a dataset
dataset = DriaDataset(name="tweet_test", description="A dataset of tweets!", schema=Tweet)
# 3. Prepare instructions
instructions = [
{"topic": "BadBadNotGood"},
{"topic": "Decentralized Synthetic Data"}
]
# 4. Create a Prompt
prompter = Prompt(prompt="Write a tweet about {{topic}}", schema=Tweet)
# 5. Generate data
generator = DatasetGenerator(dataset=dataset)
asyncio.run(
generator.generate(
instructions=instructions,
singletons=prompter,
models=Model.GPT4O
)
)
# Convert to Pandas
df = dataset.to_pandas()
print(df)
```
**Output**:
```
topic tweet
0 BadBadNotGood 🎶 Thrilled to have discovered #BadBadNotGood! ...
1 Decentralized Synthetic Data Exploring the future of #AI with decentralized...
```
---
## Usage
### 1. Define Your Dataset Schema
Use [Pydantic](https://pydantic-docs.helpmanual.io/) models to define the structure of your synthetic data. For example:
```python
from pydantic import BaseModel, Field
class Tweet(BaseModel):
topic: str = Field(..., title="Topic")
tweet: str = Field(..., title="Tweet")
```
### 2. Create a Dataset
Instantiate a `DriaDataset` by specifying its name, description, and the Pydantic schema:
```python
from dria import DriaDataset
dataset = DriaDataset(name="tweet_test", description="A dataset of tweets!", schema=Tweet)
```
### 3. Write a Prompt
Use `Prompt` objects to define how to generate data from an instruction. You can reference fields using double-curly braces:
```python
from dria import Prompt
prompter = Prompt(
prompt="Write a tweet about {{topic}}",
schema=Tweet
)
```
### 4. Generate Synthetic Data
Create a `DatasetGenerator` and call `generate`:
```python
from dria import DatasetGenerator, Model
generator = DatasetGenerator(dataset=dataset)
instructions = [{"topic": "Cats"}, {"topic": "Dogs"}]
await generator.generate(
instructions=instructions,
singletons=prompter,
models=Model.GPT4O # Example model
)
```
- `instructions`: A list of dictionaries, each used to fill the placeholders in your `Prompt`.
- `singletons`: A single prompt (or list of prompts) that is applied to all instructions.
- `models`: The model or list of models you want to use.
### 5. Convert to Pandas
Finally, convert your generated dataset to a Pandas DataFrame:
```python
import pandas as pd
df = dataset.to_pandas()
print(df)
```
> Dria supports a wide range of data type exports. You can see the full list [here](https://docs.dria.co/how-to/dria_datasets_exports). You will need to have some tokens in your balance, which will be approved automatically if required by the register command.
---
## Advanced Usage
### Available Models
Dria supports a wide range of models from OpenAI, Gemini, Ollama, and more. You can see the full list [here](https://docs.dria.co/how-to/models).
### Writing Workflows and Custom Pipelines
Dria allows you to write custom workflows and pipelines using the `Workflow` class. You can see an example of this [here](https://docs.dria.co/how-to/workflows).
### Structured Outputs
Dria allows you to define custom schemas for your outputs using Pydantic. This allows you to generate highly structured data that can be used for a wide range of applications.
You can see an example of this [here](https://docs.dria.co/how-to/structured_outputs/).
### Parallelization & Offloading
Because Dria tasks can be dispatched to a distributed network of agents, you can leverage **massive parallelization** without owning any GPUs. This is especially helpful for large-scale data generation tasks:
- Avoid timeouts or rate limits by distributing tasks.
- Scale to thousands or millions of records quickly.
---
## Contributing
Contributions are more than welcome! To get started:
1. **Fork** the repository on GitHub.
2. **Clone** your fork locally and create a new branch for your feature or fix.
3. **Install** dependencies in a virtual environment:
```bash
poetry install
```
4. **Make Changes** and **Test** them thoroughly.
5. **Submit a Pull Request** describing your changes.
We value all contributions—from bug reports and suggestions to feature implementations.
---
## License
This project is licensed under the [MIT License](LICENSE). Feel free to use it in your personal or commercial projects.
---
Raw data
{
"_id": null,
"home_page": "https://github.com/firstbatchxyz/dria-sdk",
"name": "dria",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "dria, decentralized-ai, ai, agentic",
"author": "andthattoo",
"author_email": "omer@firstbatch.xyz",
"download_url": "https://files.pythonhosted.org/packages/2b/21/0b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41/dria-0.0.116.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/firstbatchxyz/.github/refs/heads/master/branding/dria-logo-square.svg\" alt=\"logo\" width=\"168\">\n</p>\n\n<h1 align=\"center\">Dria SDK</h1>\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/dria/\"><img src=\"https://badge.fury.io/py/dria.svg\" alt=\"PyPI version\"></a>\n <a href=\"LICENSE\"><img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\"></a>\n <a href=\"https://docs.dria.co\"><img src=\"https://img.shields.io/badge/docs-online-brightgreen.svg\" alt=\"Documentation Status\"></a>\n <a href=\"https://discord.gg/dria\" target=\"_blank\">\n <img alt=\"Discord\" src=\"https://dcbadge.vercel.app/api/server/dria?style=flat\">\n </a>\n</p>\n\n**Dria SDK** is a scalable and versatile toolkit for creating and managing synthetic datasets for AI. With Dria, you can orchestrate multi-step pipelines that pull data from both web and siloed sources, blend them with powerful AI model outputs, and produce high-quality synthetic datasets\u2014**no GPU required**. \n\n---\n\n## Why Dria?\n\n- **Dataset Generation**: Easily build synthetic data pipelines using Dria\u2019s flexible APIs.\n- **Multi-Agent Network**: Orchestrate complex tasks and data retrieval using specialized agents for web search and siloed APIs.\n- **No GPUs Needed**: Offload your compute to the network, accelerating your workflows without personal GPU hardware.\n- **Customizable**: Define custom Pydantic schemas to shape the output of your datasets precisely.\n- **Model-Rich**: Use different Large Language Models (LLMs) such as OpenAI, Gemini, Ollama or others to synthesize data.\n- **Grounding & Diversity**: Add real-world context to your synthetic datasets with integrated web and siloed data retrieval.\n\n---\n\n## Installation\n\nDria SDK is available on PyPI. You can install it with:\n```bash\npip install dria\n```\n\nIt\u2019s recommended to use a virtual environment (e.g., `virtualenv` or `conda`) to avoid version conflicts with other packages.\n\n---\n\n## Quick Start\n\nHere\u2019s a minimal example to get you started with Dria:\n\n```python\nimport asyncio\nfrom dria import Prompt, DatasetGenerator, DriaDataset, Model\nfrom pydantic import BaseModel, Field\n\n# 1. Define schema\nclass Tweet(BaseModel):\n topic: str = Field(..., title=\"Topic\")\n tweet: str = Field(..., title=\"Tweet\")\n\n# 2. Create a dataset\ndataset = DriaDataset(name=\"tweet_test\", description=\"A dataset of tweets!\", schema=Tweet)\n\n# 3. Prepare instructions\ninstructions = [\n {\"topic\": \"BadBadNotGood\"},\n {\"topic\": \"Decentralized Synthetic Data\"}\n]\n\n# 4. Create a Prompt\nprompter = Prompt(prompt=\"Write a tweet about {{topic}}\", schema=Tweet)\n\n# 5. Generate data\ngenerator = DatasetGenerator(dataset=dataset)\n\nasyncio.run(\n generator.generate(\n instructions=instructions,\n singletons=prompter,\n models=Model.GPT4O\n )\n)\n\n# Convert to Pandas\ndf = dataset.to_pandas()\nprint(df)\n```\n\n**Output**:\n```\n topic tweet\n0 BadBadNotGood \ud83c\udfb6 Thrilled to have discovered #BadBadNotGood! ...\n1 Decentralized Synthetic Data Exploring the future of #AI with decentralized...\n```\n\n---\n\n## Usage\n\n### 1. Define Your Dataset Schema\n\nUse [Pydantic](https://pydantic-docs.helpmanual.io/) models to define the structure of your synthetic data. For example:\n\n```python\nfrom pydantic import BaseModel, Field\n\nclass Tweet(BaseModel):\n topic: str = Field(..., title=\"Topic\")\n tweet: str = Field(..., title=\"Tweet\")\n```\n\n### 2. Create a Dataset\n\nInstantiate a `DriaDataset` by specifying its name, description, and the Pydantic schema:\n\n```python\nfrom dria import DriaDataset\n\ndataset = DriaDataset(name=\"tweet_test\", description=\"A dataset of tweets!\", schema=Tweet)\n```\n\n### 3. Write a Prompt\n\nUse `Prompt` objects to define how to generate data from an instruction. You can reference fields using double-curly braces:\n\n```python\nfrom dria import Prompt\n\nprompter = Prompt(\n prompt=\"Write a tweet about {{topic}}\",\n schema=Tweet\n)\n```\n\n### 4. Generate Synthetic Data\n\nCreate a `DatasetGenerator` and call `generate`:\n\n```python\nfrom dria import DatasetGenerator, Model\n\ngenerator = DatasetGenerator(dataset=dataset)\n\ninstructions = [{\"topic\": \"Cats\"}, {\"topic\": \"Dogs\"}]\n\nawait generator.generate(\n instructions=instructions,\n singletons=prompter,\n models=Model.GPT4O # Example model\n)\n```\n\n- `instructions`: A list of dictionaries, each used to fill the placeholders in your `Prompt`.\n- `singletons`: A single prompt (or list of prompts) that is applied to all instructions.\n- `models`: The model or list of models you want to use.\n\n### 5. Convert to Pandas\n\nFinally, convert your generated dataset to a Pandas DataFrame:\n\n```python\nimport pandas as pd\n\ndf = dataset.to_pandas()\nprint(df)\n```\n\n> Dria supports a wide range of data type exports. You can see the full list [here](https://docs.dria.co/how-to/dria_datasets_exports). You will need to have some tokens in your balance, which will be approved automatically if required by the register command.\n\n---\n\n## Advanced Usage\n\n### Available Models\n\nDria supports a wide range of models from OpenAI, Gemini, Ollama, and more. You can see the full list [here](https://docs.dria.co/how-to/models).\n\n### Writing Workflows and Custom Pipelines\n\nDria allows you to write custom workflows and pipelines using the `Workflow` class. You can see an example of this [here](https://docs.dria.co/how-to/workflows).\n\n\n### Structured Outputs\n\nDria allows you to define custom schemas for your outputs using Pydantic. This allows you to generate highly structured data that can be used for a wide range of applications.\n\nYou can see an example of this [here](https://docs.dria.co/how-to/structured_outputs/).\n\n### Parallelization & Offloading\n\nBecause Dria tasks can be dispatched to a distributed network of agents, you can leverage **massive parallelization** without owning any GPUs. This is especially helpful for large-scale data generation tasks:\n- Avoid timeouts or rate limits by distributing tasks.\n- Scale to thousands or millions of records quickly.\n\n---\n\n## Contributing\n\nContributions are more than welcome! To get started:\n\n1. **Fork** the repository on GitHub. \n2. **Clone** your fork locally and create a new branch for your feature or fix.\n3. **Install** dependencies in a virtual environment: \n ```bash\n poetry install\n ```\n4. **Make Changes** and **Test** them thoroughly.\n5. **Submit a Pull Request** describing your changes.\n\nWe value all contributions\u2014from bug reports and suggestions to feature implementations.\n\n---\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE). Feel free to use it in your personal or commercial projects.\n\n---",
"bugtrack_url": null,
"license": null,
"summary": "Dria SDK - A Python library for interacting with the Dria Network",
"version": "0.0.116",
"project_urls": {
"Bug Tracker": "https://github.com/firstbatchxyz/dria-sdk/issues",
"Documentation": "https://docs.dria.co",
"Homepage": "https://github.com/firstbatchxyz/dria-sdk",
"Repository": "https://github.com/firstbatchxyz/dria-sdk"
},
"split_keywords": [
"dria",
" decentralized-ai",
" ai",
" agentic"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f7008bb05326dc497c7560ea02bf191b577830a0d8954e25f4f3cb56d5aebfcb",
"md5": "a35a893201000e83a97f426d74a05f05",
"sha256": "49a888969296acbdfeeb9906bd55a6c8dead95007d5d086c30683c1f7202e65f"
},
"downloads": -1,
"filename": "dria-0.0.116-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a35a893201000e83a97f426d74a05f05",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 114705,
"upload_time": "2025-02-14T10:36:44",
"upload_time_iso_8601": "2025-02-14T10:36:44.984811Z",
"url": "https://files.pythonhosted.org/packages/f7/00/8bb05326dc497c7560ea02bf191b577830a0d8954e25f4f3cb56d5aebfcb/dria-0.0.116-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b210b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41",
"md5": "cfe047e9c1e64c195f2e45ad20f9eb3e",
"sha256": "5f5039e8824b2b5bcf85811c20d4748c513801d099d2ba9d7d1303fb409ba905"
},
"downloads": -1,
"filename": "dria-0.0.116.tar.gz",
"has_sig": false,
"md5_digest": "cfe047e9c1e64c195f2e45ad20f9eb3e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 77098,
"upload_time": "2025-02-14T10:36:46",
"upload_time_iso_8601": "2025-02-14T10:36:46.448140Z",
"url": "https://files.pythonhosted.org/packages/2b/21/0b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41/dria-0.0.116.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-14 10:36:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "firstbatchxyz",
"github_project": "dria-sdk",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "dria"
}