queryboost

Name	queryboost JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	The official Python client for the Queryboost API
upload_time	2025-10-18 04:35:31
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	None
keywords	ai batch inference queryboost streaming
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Queryboost Python SDK

The official Python SDK for the Queryboost API, built on gRPC for high-performance AI data processing.

## Installation

```bash
pip install queryboost
```

## Features

Queryboost introduces a new architecture for AI-native data processing at scale:

- **Distributed Continuous Batching** — Ensures that GPUs are fully utilized by continuously batching and scheduling in-flight requests to a pool of reserved GPUs, maximizing throughput and lowering cost per inference.

- **Bidirectional Streaming** — Delivers results in real time as they're ready via gRPC streaming. No need to wait for the entire dataset to finish processing.

- **Column-Aware Prompting** — Run AI queries over your data with column context using native `{column_name}` syntax.

- **Schema-Aware AI Pipeline** — Automatically infers output schema from your data and then generates consistent, schema-compliant structured outputs.

- **Structured Columnar Outputs** — Returns data in Apache Arrow format with a consistent schema, ready to plug directly into analytics and BI workflows.

- **Probability Scores** — Every output includes per-column probability scores derived from token-level log probabilities, enabling quality filtering and uncertainty quantification. For boolean outputs, scores represent P(True).

- **Optimized Model** — Powered by `queryboost-4b`, a 4B-parameter model optimized for data processing tasks with structured outputs. Outperforms larger models on reading comprehension and natural language inference. [See benchmarks →](https://queryboost.com/#benchmarks)

The Python SDK provides a simple interface to the Queryboost API:

- **Flexible Data Input** — Supports Hugging Face Datasets, IterableDatasets, lists of dictionaries, or custom iterators for large data streams.

- **Pipeline Orchestration** — Starts the bidirectional streaming connection and tracks progress concurrently.

- **Extensible Output Handlers** — Save results locally with `LocalParquetBatchHandler` (default) or implement custom handlers for S3, databases, or any destination.

## Usage

```python
import os
from pathlib import Path
import pyarrow.parquet as pq
from datasets import load_dataset
from queryboost import Queryboost

qb = Queryboost(
# This is the default and can be omitted
api_key=os.getenv("QUERYBOOST_API_KEY")
)

# Load data (supports HF Dataset, IterableDataset, list of dicts, or iterator of dicts)
data = load_dataset("queryboost/OpenCustConvo", split="train")

# Select first 160 rows
data = data.select(range(160))

# Use {column_name} to insert column values into prompt
prompt = "Did the customer's issue get resolved in this {chat_transcript}? Explain briefly."

# Process data with bidirectional streaming
# By default, results are saved to ~/.cache/queryboost/ as parquet files
# Pass a custom BatchHandler to save elsewhere (e.g., S3, database)
qb.run(
data,
prompt,
name="cust_convo_analysis", # Unique name for this run
num_gpus=5, # Number of GPUs to reserve for this run
)

# Read the results
table = pq.read_table(Path("~/.cache/queryboost/cust_convo_analysis").expanduser())
```

When you run this code, Queryboost executes end-to-end distributed streaming to process your data with AI:

- **Streams** batches of rows to the API via bidirectional gRPC
- **Distributes** the data stream across a pool of reserved GPUs for parallel processing
- **Dynamically and continuously batches** rows on the server side
- **Generates** structured outputs with consistent schema and associated probability scores
- **Streams** batches of results back from the API in real time
- **Saves** results as Parquet files to `~/.cache/queryboost/cust_convo_analysis`

> **Note:** Custom `BatchHandler` implementations can save results to remote storage like S3, databases, etc. See [Custom Batch Handlers](#custom-batch-handlers) below.

### Example

Using the customer service transcript example from above:

**Input data**

| chat_id | chat_transcript |
| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 1 | Customer: Hi, I'm having trouble logging into my account. The password reset link isn't working. Agent: I understand that's frustrating. Let me help you with that. I've just sent a new password reset link to your email. Customer: Got it! The new link worked. Thanks for your help! Agent: You're welcome! Is there anything else I can assist you with today? |
| 2 | Customer: My order #12345 hasn't arrived yet. It's been 2 weeks. Agent: I apologize for the delay. Let me check the status for you... I see there was a shipping delay due to weather conditions. We don't have an updated delivery estimate yet. Customer: This is really frustrating. I need this order urgently. Agent: I understand your frustration. I'll escalate this to our shipping department and have them contact you within 24 hours with an update. Customer: Fine, but I'm very disappointed with this service. |

**Output**

| chat_id | is_resolved | explanation | is_resolved_prob | explanation_prob |
| ------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- | ---------------- | ---------------- |
| 1 | True | The customer was able to successfully reset their password with the new link provided by the agent. | 0.9823 | 0.9547 |
| 2 | False | The shipping issue remains unresolved, with the agent only promising to escalate and provide an update within 24 hours. | 0.0421 | 0.9312 |

> **Note:** Probability scores are derived from token-level log probabilities. For boolean columns like `is_resolved`, the score represents P(True).

## Custom Batch Handlers

Batch handlers control how Queryboost handles results that stream in from the API. They can write to local files, upload to cloud storage, insert into databases, or perform custom post-processing.

Queryboost includes `LocalParquetBatchHandler` by default. You can create custom handlers for other destinations:

- Databases (PostgreSQL, Snowflake, BigQuery, etc.)
- Object stores (S3, GCS)

Custom handlers inherit from `BatchHandler` and implement a `handle(batch, batch_idx)` method. See `src/queryboost/handlers/local.py` for a reference implementation.

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "queryboost",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "ai, batch, inference, queryboost, streaming",
    "author": null,
    "author_email": "Johnson Kuan <johnson@queryboost.com>",
    "download_url": "https://files.pythonhosted.org/packages/30/be/10f97a312767b571f49a4e452918d03e5f7b2f66356c8ea872fbe22d9a8b/queryboost-0.1.1.tar.gz",
    "platform": null,
    "description": "# Queryboost Python SDK\n\nThe official Python SDK for the Queryboost API, built on gRPC for high-performance AI data processing.\n\n## Installation\n\n```bash\npip install queryboost\n```\n\n## Features\n\nQueryboost introduces a new architecture for AI-native data processing at scale:\n\n- **Distributed Continuous Batching** \u2014 Ensures that GPUs are fully utilized by continuously batching and scheduling in-flight requests to a pool of reserved GPUs, maximizing throughput and lowering cost per inference.\n\n- **Bidirectional Streaming** \u2014 Delivers results in real time as they're ready via gRPC streaming. No need to wait for the entire dataset to finish processing.\n\n- **Column-Aware Prompting** \u2014 Run AI queries over your data with column context using native `{column_name}` syntax.\n\n- **Schema-Aware AI Pipeline** \u2014 Automatically infers output schema from your data and then generates consistent, schema-compliant structured outputs.\n\n- **Structured Columnar Outputs** \u2014 Returns data in Apache Arrow format with a consistent schema, ready to plug directly into analytics and BI workflows.\n\n- **Probability Scores** \u2014 Every output includes per-column probability scores derived from token-level log probabilities, enabling quality filtering and uncertainty quantification. For boolean outputs, scores represent P(True).\n\n- **Optimized Model** \u2014 Powered by `queryboost-4b`, a 4B-parameter model optimized for data processing tasks with structured outputs. Outperforms larger models on reading comprehension and natural language inference. [See benchmarks \u2192](https://queryboost.com/#benchmarks)\n\nThe Python SDK provides a simple interface to the Queryboost API:\n\n- **Flexible Data Input** \u2014 Supports Hugging Face Datasets, IterableDatasets, lists of dictionaries, or custom iterators for large data streams.\n\n- **Pipeline Orchestration** \u2014 Starts the bidirectional streaming connection and tracks progress concurrently.\n\n- **Extensible Output Handlers** \u2014 Save results locally with `LocalParquetBatchHandler` (default) or implement custom handlers for S3, databases, or any destination.\n\n## Usage\n\n```python\nimport os\nfrom pathlib import Path\nimport pyarrow.parquet as pq\nfrom datasets import load_dataset\nfrom queryboost import Queryboost\n\nqb = Queryboost(\n    # This is the default and can be omitted\n    api_key=os.getenv(\"QUERYBOOST_API_KEY\")\n)\n\n# Load data (supports HF Dataset, IterableDataset, list of dicts, or iterator of dicts)\ndata = load_dataset(\"queryboost/OpenCustConvo\", split=\"train\")\n\n# Select first 160 rows\ndata = data.select(range(160))\n\n# Use {column_name} to insert column values into prompt\nprompt = \"Did the customer's issue get resolved in this {chat_transcript}? Explain briefly.\"\n\n# Process data with bidirectional streaming\n# By default, results are saved to ~/.cache/queryboost/ as parquet files\n# Pass a custom BatchHandler to save elsewhere (e.g., S3, database)\nqb.run(\n    data,\n    prompt,\n    name=\"cust_convo_analysis\", # Unique name for this run\n    num_gpus=5, # Number of GPUs to reserve for this run\n)\n\n# Read the results\ntable = pq.read_table(Path(\"~/.cache/queryboost/cust_convo_analysis\").expanduser())\n```\n\nWhen you run this code, Queryboost executes end-to-end distributed streaming to process your data with AI:\n\n- **Streams** batches of rows to the API via bidirectional gRPC\n- **Distributes** the data stream across a pool of reserved GPUs for parallel processing\n- **Dynamically and continuously batches** rows on the server side\n- **Generates** structured outputs with consistent schema and associated probability scores\n- **Streams** batches of results back from the API in real time\n- **Saves** results as Parquet files to `~/.cache/queryboost/cust_convo_analysis`\n\n> **Note:** Custom `BatchHandler` implementations can save results to remote storage like S3, databases, etc. See [Custom Batch Handlers](#custom-batch-handlers) below.\n\n### Example\n\nUsing the customer service transcript example from above:\n\n**Input data**\n\n| chat_id | chat_transcript                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| 1       | Customer: Hi, I'm having trouble logging into my account. The password reset link isn't working.<br>Agent: I understand that's frustrating. Let me help you with that. I've just sent a new password reset link to your email.<br>Customer: Got it! The new link worked. Thanks for your help!<br>Agent: You're welcome! Is there anything else I can assist you with today?                                                                                                                                                               |\n| 2       | Customer: My order #12345 hasn't arrived yet. It's been 2 weeks.<br>Agent: I apologize for the delay. Let me check the status for you... I see there was a shipping delay due to weather conditions. We don't have an updated delivery estimate yet.<br>Customer: This is really frustrating. I need this order urgently.<br>Agent: I understand your frustration. I'll escalate this to our shipping department and have them contact you within 24 hours with an update.<br>Customer: Fine, but I'm very disappointed with this service. |\n\n**Output**\n\n| chat_id | is_resolved | explanation                                                                                                             | is_resolved_prob | explanation_prob |\n| ------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- | ---------------- | ---------------- |\n| 1       | True        | The customer was able to successfully reset their password with the new link provided by the agent.                     | 0.9823           | 0.9547           |\n| 2       | False       | The shipping issue remains unresolved, with the agent only promising to escalate and provide an update within 24 hours. | 0.0421           | 0.9312           |\n\n> **Note:** Probability scores are derived from token-level log probabilities. For boolean columns like `is_resolved`, the score represents P(True).\n\n## Custom Batch Handlers\n\nBatch handlers control how Queryboost handles results that stream in from the API. They can write to local files, upload to cloud storage, insert into databases, or perform custom post-processing.\n\nQueryboost includes `LocalParquetBatchHandler` by default. You can create custom handlers for other destinations:\n\n- Databases (PostgreSQL, Snowflake, BigQuery, etc.)\n- Object stores (S3, GCS)\n\nCustom handlers inherit from `BatchHandler` and implement a `handle(batch, batch_idx)` method. See `src/queryboost/handlers/local.py` for a reference implementation.\n\n## License\n\nCopyright \u00a9 2025 Queryboost Inc.\n\nLicensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "The official Python client for the Queryboost API",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://queryboost.com",
        "Repository": "https://github.com/queryboost/queryboost"
    },
    "split_keywords": [
        "ai",
        " batch",
        " inference",
        " queryboost",
        " streaming"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0e41bfdfc1500abb3b864a308501ed91c8af3cefd7cd2b111fcf0fb441390558",
                "md5": "b43dcaf46f91cfdee4cd054413eb640d",
                "sha256": "c848bad600b47065213c9a2262458aa9cd3e8c3d7d20f9a5f7ed86c27a905f4f"
            },
            "downloads": -1,
            "filename": "queryboost-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b43dcaf46f91cfdee4cd054413eb640d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 20071,
            "upload_time": "2025-10-18T04:35:30",
            "upload_time_iso_8601": "2025-10-18T04:35:30.001999Z",
            "url": "https://files.pythonhosted.org/packages/0e/41/bfdfc1500abb3b864a308501ed91c8af3cefd7cd2b111fcf0fb441390558/queryboost-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "30be10f97a312767b571f49a4e452918d03e5f7b2f66356c8ea872fbe22d9a8b",
                "md5": "9e2a02cccd5eb5fe6c96b92d9faa43b1",
                "sha256": "b46256d67a609127b73ee9714c0b561b23d30cdaf8136a66c2201bd5dd4d46a0"
            },
            "downloads": -1,
            "filename": "queryboost-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9e2a02cccd5eb5fe6c96b92d9faa43b1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 100066,
            "upload_time": "2025-10-18T04:35:31",
            "upload_time_iso_8601": "2025-10-18T04:35:31.172366Z",
            "url": "https://files.pythonhosted.org/packages/30/be/10f97a312767b571f49a4e452918d03e5f7b2f66356c8ea872fbe22d9a8b/queryboost-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-18 04:35:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "queryboost",
    "github_project": "queryboost",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "queryboost"
}

None