dria


Namedria JSON
Version 0.0.116 PyPI version JSON
download
home_pagehttps://github.com/firstbatchxyz/dria-sdk
SummaryDria SDK - A Python library for interacting with the Dria Network
upload_time2025-02-14 10:36:46
maintainerNone
docs_urlNone
authorandthattoo
requires_python<4.0,>=3.10
licenseNone
keywords dria decentralized-ai ai agentic
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://raw.githubusercontent.com/firstbatchxyz/.github/refs/heads/master/branding/dria-logo-square.svg" alt="logo" width="168">
</p>

<h1 align="center">Dria SDK</h1>

<p align="center">
  <a href="https://pypi.org/project/dria/"><img src="https://badge.fury.io/py/dria.svg" alt="PyPI version"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
  <a href="https://docs.dria.co"><img src="https://img.shields.io/badge/docs-online-brightgreen.svg" alt="Documentation Status"></a>
      <a href="https://discord.gg/dria" target="_blank">
        <img alt="Discord" src="https://dcbadge.vercel.app/api/server/dria?style=flat">
    </a>
</p>

**Dria SDK** is a scalable and versatile toolkit for creating and managing synthetic datasets for AI. With Dria, you can orchestrate multi-step pipelines that pull data from both web and siloed sources, blend them with powerful AI model outputs, and produce high-quality synthetic datasets—**no GPU required**.  

---

## Why Dria?

- **Dataset Generation**: Easily build synthetic data pipelines using Dria’s flexible APIs.
- **Multi-Agent Network**: Orchestrate complex tasks and data retrieval using specialized agents for web search and siloed APIs.
- **No GPUs Needed**: Offload your compute to the network, accelerating your workflows without personal GPU hardware.
- **Customizable**: Define custom Pydantic schemas to shape the output of your datasets precisely.
- **Model-Rich**: Use different Large Language Models (LLMs) such as OpenAI, Gemini, Ollama or others to synthesize data.
- **Grounding & Diversity**: Add real-world context to your synthetic datasets with integrated web and siloed data retrieval.

---

## Installation

Dria SDK is available on PyPI. You can install it with:
```bash
pip install dria
```

It’s recommended to use a virtual environment (e.g., `virtualenv` or `conda`) to avoid version conflicts with other packages.

---

## Quick Start

Here’s a minimal example to get you started with Dria:

```python
import asyncio
from dria import Prompt, DatasetGenerator, DriaDataset, Model
from pydantic import BaseModel, Field

# 1. Define schema
class Tweet(BaseModel):
    topic: str = Field(..., title="Topic")
    tweet: str = Field(..., title="Tweet")

# 2. Create a dataset
dataset = DriaDataset(name="tweet_test", description="A dataset of tweets!", schema=Tweet)

# 3. Prepare instructions
instructions = [
    {"topic": "BadBadNotGood"},
    {"topic": "Decentralized Synthetic Data"}
]

# 4. Create a Prompt
prompter = Prompt(prompt="Write a tweet about {{topic}}", schema=Tweet)

# 5. Generate data
generator = DatasetGenerator(dataset=dataset)

asyncio.run(
    generator.generate(
        instructions=instructions,
        singletons=prompter,
        models=Model.GPT4O
    )
)

# Convert to Pandas
df = dataset.to_pandas()
print(df)
```

**Output**:
```
                         topic                                              tweet
0                BadBadNotGood  🎶 Thrilled to have discovered #BadBadNotGood! ...
1  Decentralized Synthetic Data  Exploring the future of #AI with decentralized...
```

---

## Usage

### 1. Define Your Dataset Schema

Use [Pydantic](https://pydantic-docs.helpmanual.io/) models to define the structure of your synthetic data. For example:

```python
from pydantic import BaseModel, Field

class Tweet(BaseModel):
    topic: str = Field(..., title="Topic")
    tweet: str = Field(..., title="Tweet")
```

### 2. Create a Dataset

Instantiate a `DriaDataset` by specifying its name, description, and the Pydantic schema:

```python
from dria import DriaDataset

dataset = DriaDataset(name="tweet_test", description="A dataset of tweets!", schema=Tweet)
```

### 3. Write a Prompt

Use `Prompt` objects to define how to generate data from an instruction. You can reference fields using double-curly braces:

```python
from dria import Prompt

prompter = Prompt(
    prompt="Write a tweet about {{topic}}",
    schema=Tweet
)
```

### 4. Generate Synthetic Data

Create a `DatasetGenerator` and call `generate`:

```python
from dria import DatasetGenerator, Model

generator = DatasetGenerator(dataset=dataset)

instructions = [{"topic": "Cats"}, {"topic": "Dogs"}]

await generator.generate(
    instructions=instructions,
    singletons=prompter,
    models=Model.GPT4O  # Example model
)
```

- `instructions`: A list of dictionaries, each used to fill the placeholders in your `Prompt`.
- `singletons`: A single prompt (or list of prompts) that is applied to all instructions.
- `models`: The model or list of models you want to use.

### 5. Convert to Pandas

Finally, convert your generated dataset to a Pandas DataFrame:

```python
import pandas as pd

df = dataset.to_pandas()
print(df)
```

> Dria supports a wide range of data type exports. You can see the full list [here](https://docs.dria.co/how-to/dria_datasets_exports). You will need to have some tokens in your balance, which will be approved automatically if required by the register command.

---

## Advanced Usage

### Available Models

Dria supports a wide range of models from OpenAI, Gemini, Ollama, and more. You can see the full list [here](https://docs.dria.co/how-to/models).

### Writing Workflows and Custom Pipelines

Dria allows you to write custom workflows and pipelines using the `Workflow` class. You can see an example of this [here](https://docs.dria.co/how-to/workflows).


### Structured Outputs

Dria allows you to define custom schemas for your outputs using Pydantic. This allows you to generate highly structured data that can be used for a wide range of applications.

You can see an example of this [here](https://docs.dria.co/how-to/structured_outputs/).

### Parallelization & Offloading

Because Dria tasks can be dispatched to a distributed network of agents, you can leverage **massive parallelization** without owning any GPUs. This is especially helpful for large-scale data generation tasks:
- Avoid timeouts or rate limits by distributing tasks.
- Scale to thousands or millions of records quickly.

---

## Contributing

Contributions are more than welcome! To get started:

1. **Fork** the repository on GitHub.  
2. **Clone** your fork locally and create a new branch for your feature or fix.
3. **Install** dependencies in a virtual environment:  
   ```bash
   poetry install
   ```
4. **Make Changes** and **Test** them thoroughly.
5. **Submit a Pull Request** describing your changes.

We value all contributions—from bug reports and suggestions to feature implementations.

---

## License

This project is licensed under the [MIT License](LICENSE). Feel free to use it in your personal or commercial projects.

---
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/firstbatchxyz/dria-sdk",
    "name": "dria",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "dria, decentralized-ai, ai, agentic",
    "author": "andthattoo",
    "author_email": "omer@firstbatch.xyz",
    "download_url": "https://files.pythonhosted.org/packages/2b/21/0b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41/dria-0.0.116.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/firstbatchxyz/.github/refs/heads/master/branding/dria-logo-square.svg\" alt=\"logo\" width=\"168\">\n</p>\n\n<h1 align=\"center\">Dria SDK</h1>\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/dria/\"><img src=\"https://badge.fury.io/py/dria.svg\" alt=\"PyPI version\"></a>\n  <a href=\"LICENSE\"><img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\"></a>\n  <a href=\"https://docs.dria.co\"><img src=\"https://img.shields.io/badge/docs-online-brightgreen.svg\" alt=\"Documentation Status\"></a>\n      <a href=\"https://discord.gg/dria\" target=\"_blank\">\n        <img alt=\"Discord\" src=\"https://dcbadge.vercel.app/api/server/dria?style=flat\">\n    </a>\n</p>\n\n**Dria SDK** is a scalable and versatile toolkit for creating and managing synthetic datasets for AI. With Dria, you can orchestrate multi-step pipelines that pull data from both web and siloed sources, blend them with powerful AI model outputs, and produce high-quality synthetic datasets\u2014**no GPU required**.  \n\n---\n\n## Why Dria?\n\n- **Dataset Generation**: Easily build synthetic data pipelines using Dria\u2019s flexible APIs.\n- **Multi-Agent Network**: Orchestrate complex tasks and data retrieval using specialized agents for web search and siloed APIs.\n- **No GPUs Needed**: Offload your compute to the network, accelerating your workflows without personal GPU hardware.\n- **Customizable**: Define custom Pydantic schemas to shape the output of your datasets precisely.\n- **Model-Rich**: Use different Large Language Models (LLMs) such as OpenAI, Gemini, Ollama or others to synthesize data.\n- **Grounding & Diversity**: Add real-world context to your synthetic datasets with integrated web and siloed data retrieval.\n\n---\n\n## Installation\n\nDria SDK is available on PyPI. You can install it with:\n```bash\npip install dria\n```\n\nIt\u2019s recommended to use a virtual environment (e.g., `virtualenv` or `conda`) to avoid version conflicts with other packages.\n\n---\n\n## Quick Start\n\nHere\u2019s a minimal example to get you started with Dria:\n\n```python\nimport asyncio\nfrom dria import Prompt, DatasetGenerator, DriaDataset, Model\nfrom pydantic import BaseModel, Field\n\n# 1. Define schema\nclass Tweet(BaseModel):\n    topic: str = Field(..., title=\"Topic\")\n    tweet: str = Field(..., title=\"Tweet\")\n\n# 2. Create a dataset\ndataset = DriaDataset(name=\"tweet_test\", description=\"A dataset of tweets!\", schema=Tweet)\n\n# 3. Prepare instructions\ninstructions = [\n    {\"topic\": \"BadBadNotGood\"},\n    {\"topic\": \"Decentralized Synthetic Data\"}\n]\n\n# 4. Create a Prompt\nprompter = Prompt(prompt=\"Write a tweet about {{topic}}\", schema=Tweet)\n\n# 5. Generate data\ngenerator = DatasetGenerator(dataset=dataset)\n\nasyncio.run(\n    generator.generate(\n        instructions=instructions,\n        singletons=prompter,\n        models=Model.GPT4O\n    )\n)\n\n# Convert to Pandas\ndf = dataset.to_pandas()\nprint(df)\n```\n\n**Output**:\n```\n                         topic                                              tweet\n0                BadBadNotGood  \ud83c\udfb6 Thrilled to have discovered #BadBadNotGood! ...\n1  Decentralized Synthetic Data  Exploring the future of #AI with decentralized...\n```\n\n---\n\n## Usage\n\n### 1. Define Your Dataset Schema\n\nUse [Pydantic](https://pydantic-docs.helpmanual.io/) models to define the structure of your synthetic data. For example:\n\n```python\nfrom pydantic import BaseModel, Field\n\nclass Tweet(BaseModel):\n    topic: str = Field(..., title=\"Topic\")\n    tweet: str = Field(..., title=\"Tweet\")\n```\n\n### 2. Create a Dataset\n\nInstantiate a `DriaDataset` by specifying its name, description, and the Pydantic schema:\n\n```python\nfrom dria import DriaDataset\n\ndataset = DriaDataset(name=\"tweet_test\", description=\"A dataset of tweets!\", schema=Tweet)\n```\n\n### 3. Write a Prompt\n\nUse `Prompt` objects to define how to generate data from an instruction. You can reference fields using double-curly braces:\n\n```python\nfrom dria import Prompt\n\nprompter = Prompt(\n    prompt=\"Write a tweet about {{topic}}\",\n    schema=Tweet\n)\n```\n\n### 4. Generate Synthetic Data\n\nCreate a `DatasetGenerator` and call `generate`:\n\n```python\nfrom dria import DatasetGenerator, Model\n\ngenerator = DatasetGenerator(dataset=dataset)\n\ninstructions = [{\"topic\": \"Cats\"}, {\"topic\": \"Dogs\"}]\n\nawait generator.generate(\n    instructions=instructions,\n    singletons=prompter,\n    models=Model.GPT4O  # Example model\n)\n```\n\n- `instructions`: A list of dictionaries, each used to fill the placeholders in your `Prompt`.\n- `singletons`: A single prompt (or list of prompts) that is applied to all instructions.\n- `models`: The model or list of models you want to use.\n\n### 5. Convert to Pandas\n\nFinally, convert your generated dataset to a Pandas DataFrame:\n\n```python\nimport pandas as pd\n\ndf = dataset.to_pandas()\nprint(df)\n```\n\n> Dria supports a wide range of data type exports. You can see the full list [here](https://docs.dria.co/how-to/dria_datasets_exports). You will need to have some tokens in your balance, which will be approved automatically if required by the register command.\n\n---\n\n## Advanced Usage\n\n### Available Models\n\nDria supports a wide range of models from OpenAI, Gemini, Ollama, and more. You can see the full list [here](https://docs.dria.co/how-to/models).\n\n### Writing Workflows and Custom Pipelines\n\nDria allows you to write custom workflows and pipelines using the `Workflow` class. You can see an example of this [here](https://docs.dria.co/how-to/workflows).\n\n\n### Structured Outputs\n\nDria allows you to define custom schemas for your outputs using Pydantic. This allows you to generate highly structured data that can be used for a wide range of applications.\n\nYou can see an example of this [here](https://docs.dria.co/how-to/structured_outputs/).\n\n### Parallelization & Offloading\n\nBecause Dria tasks can be dispatched to a distributed network of agents, you can leverage **massive parallelization** without owning any GPUs. This is especially helpful for large-scale data generation tasks:\n- Avoid timeouts or rate limits by distributing tasks.\n- Scale to thousands or millions of records quickly.\n\n---\n\n## Contributing\n\nContributions are more than welcome! To get started:\n\n1. **Fork** the repository on GitHub.  \n2. **Clone** your fork locally and create a new branch for your feature or fix.\n3. **Install** dependencies in a virtual environment:  \n   ```bash\n   poetry install\n   ```\n4. **Make Changes** and **Test** them thoroughly.\n5. **Submit a Pull Request** describing your changes.\n\nWe value all contributions\u2014from bug reports and suggestions to feature implementations.\n\n---\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE). Feel free to use it in your personal or commercial projects.\n\n---",
    "bugtrack_url": null,
    "license": null,
    "summary": "Dria SDK - A Python library for interacting with the Dria Network",
    "version": "0.0.116",
    "project_urls": {
        "Bug Tracker": "https://github.com/firstbatchxyz/dria-sdk/issues",
        "Documentation": "https://docs.dria.co",
        "Homepage": "https://github.com/firstbatchxyz/dria-sdk",
        "Repository": "https://github.com/firstbatchxyz/dria-sdk"
    },
    "split_keywords": [
        "dria",
        " decentralized-ai",
        " ai",
        " agentic"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f7008bb05326dc497c7560ea02bf191b577830a0d8954e25f4f3cb56d5aebfcb",
                "md5": "a35a893201000e83a97f426d74a05f05",
                "sha256": "49a888969296acbdfeeb9906bd55a6c8dead95007d5d086c30683c1f7202e65f"
            },
            "downloads": -1,
            "filename": "dria-0.0.116-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a35a893201000e83a97f426d74a05f05",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 114705,
            "upload_time": "2025-02-14T10:36:44",
            "upload_time_iso_8601": "2025-02-14T10:36:44.984811Z",
            "url": "https://files.pythonhosted.org/packages/f7/00/8bb05326dc497c7560ea02bf191b577830a0d8954e25f4f3cb56d5aebfcb/dria-0.0.116-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b210b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41",
                "md5": "cfe047e9c1e64c195f2e45ad20f9eb3e",
                "sha256": "5f5039e8824b2b5bcf85811c20d4748c513801d099d2ba9d7d1303fb409ba905"
            },
            "downloads": -1,
            "filename": "dria-0.0.116.tar.gz",
            "has_sig": false,
            "md5_digest": "cfe047e9c1e64c195f2e45ad20f9eb3e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 77098,
            "upload_time": "2025-02-14T10:36:46",
            "upload_time_iso_8601": "2025-02-14T10:36:46.448140Z",
            "url": "https://files.pythonhosted.org/packages/2b/21/0b02d835c4f37d993982229fcba530dabb4c5dee956ae5264f7e9c77fa41/dria-0.0.116.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-14 10:36:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "firstbatchxyz",
    "github_project": "dria-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "dria"
}
        
Elapsed time: 0.44290s