csvai


Namecsvai JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryEnrich CSV or Excel rows using OpenAI models (text or image analysis).
upload_time2025-09-10 17:17:19
maintainerNone
docs_urlNone
authorZyxware Technologies, Vimal Joseph
requires_python>=3.9
licenseNone
keywords csv excel ai openai data enrichment llm automation vision image analysis
VCS
bugtrack_url
requirements openai jinja2 python-dotenv pandas openpyxl streamlit
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CSVAI — Apply an AI prompt to each row in a CSV or Excel file and write enriched results

The `csvai` library reads an input CSV or Excel file, renders a prompt for each row (you can use raw column names like `{{ Address }}`), calls an **OpenAI model via the Responses API**, and writes the original columns plus AI-generated fields to an output CSV or Excel file. It also support **image analysis** (vision) when enabled.

The tool is **async + concurrent**, **resumable**, and **crash-safe**. It supports **Structured Outputs** with a **JSON Schema** for reliable JSON, or **JSON mode** (without a schema) if you prefer a lighter setup.

We also have a **CSV AI Prompt Builder** (a Custom GPT) to help you generate prompts and JSON Schemas tailored to your CSVs.

---

## Features

* **Structured Outputs**: enforce exact JSON with a schema for consistent, validated results.
* **JSON mode**: force a single JSON object without defining a schema.
* **Async & concurrent**: process many rows in parallel for faster throughput.
* **Resumable**: rows already written (by `id`) are skipped on re-run.
* **CSV or Excel**: handle `.csv` and `.xlsx` inputs and outputs.
* **image analysis**: add `--process-image` to attach an image per row (via URL or local file) to multimodal models like `gpt-4o-mini`.

---

## Installation

Requires Python **3.9+**.
OpenAI API Key: Create a key - https://platform.openai.com/api-keys and use it in the .env file with OPENAI_API_KEY=
See example.env in the [project repo](https://github.com/zyxware/csvai).

### From PyPI

```bash
pip install csvai
# Include Streamlit UI dependencies
pip install "csvai[ui]"
```

### From GitHub

```bash
# Install directly from the repository
pip install git+https://github.com/zyxware/csvai.git
# With Streamlit UI dependencies
pip install "csvai[ui] @ git+https://github.com/zyxware/csvai.git"
```

### For local development

```bash
git clone https://github.com/zyxware/csvai
cd csvai
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
cp example.env .env
# Edit .env and set OPENAI_API_KEY=sk-...
```

Installing the package exposes the `csvai` CLI and the `csvai-ui` command.

---

## Usage

### CLI

#### Auto-discovery

If you name your files like this:

```
input.csv      # or input.xlsx
input.prompt.txt
input.schema.json   # optional
```

Run:

```bash
csvai input.csv      # or input.xlsx
```

#### Or specify prompt & schema explicitly

```bash
# With a prompt and a strict schema (best reliability)
csvai address.xlsx --prompt address.prompt.txt --schema address.schema.json

# Or JSON mode (no schema; still a single JSON object)
csvai address.xlsx --prompt address.prompt.txt
```

Sample datasets (`address.csv` and `address.xlsx`) with the matching prompt and schema live in the `example/` directory.

### Streamlit UI

After installing with the `ui` extra, launch the web interface:

```bash
csvai-ui
```

The UI lets you upload a CSV/Excel file, provide a prompt and optional schema.
A "Process images" toggle is available to attach an image per row; you can set the image column (default `image`) and the image root directory (default `./images`).

---

## Example Prompt & Schema

### Prompt (`address.prompt.txt`)

```text
Extract city, state, and country from the given address.

Rules:
- city: city/town/locality (preserve accents, proper case)
- state: ISO-standard name of the state/region/province or "" if none
- country: ISO 3166 English short name of the country; infer if obvious, else ""
- Ignore descriptors like "(EU)"
- Do not guess street-level info

Inputs:
Address: {{Address}}
```

### Schema (`address.schema.json`)

```json
{
  "type": "object",
  "properties": {
    "city": {
      "type": "string",
      "description": "City, town, or locality name with correct casing and accents preserved"
    },
    "state": {
      "type": "string",
      "description": "ISO-standard name of the state, region, or province, or empty string if none"
    },
    "country": {
      "type": "string",
      "description": "ISO 3166 English short name of the country, inferred if obvious, else empty string"
    }
  },
  "required": ["city", "state", "country"],
  "additionalProperties": false
}
```
---

## Creating Prompts & Schemas with the CSV AI Prompt Builder

You can use the **[CSV AI Prompt Builder](https://chat.openai.com/g/g-689d8067bd888191a896d2cfdab27a39-csv-ai-prompt-builder)** custom GPT to:

* Quickly design a **prompt** tailored to your CSV data.
* Generate a **JSON Schema** that matches your desired structured output.

**Example input to the builder:**

```
File: reviews.csv. Inputs: title,body,stars. Output: sentiment,summary.
```

**Example result:**

**Prompt**

```
Analyze each review and produce sentiment and a concise summary.

Rules:
- sentiment: one of positive, neutral, negative.
- Star mapping: stars ≤ 2 ⇒ negative; 3 ⇒ neutral; ≥ 4 ⇒ positive. If stars is blank or invalid, infer from tone.
- summary: 1–2 sentences, factual, include key pros/cons, no emojis, no first person, no marketing fluff.
- Use the same language as the Body.
- Return only the fields required by the tool schema.

Inputs:
Title: {{title}}
Body: {{body}}
Stars: {{stars}}
```

**Schema**

```json
{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "description": "Overall sentiment derived from stars and/or tone; one of positive, neutral, negative",
      "enum": ["positive", "neutral", "negative"]
    },
    "summary": {
      "type": "string",
      "description": "Concise 1–2 sentence summary capturing key pros/cons without opinionated fluff"
    }
  },
  "required": ["sentiment", "summary"],
  "additionalProperties": false
}
```

**Command to execute**

```bash
python -m csvai.cli reviews.csv --prompt reviews.prompt.txt --schema reviews.schema.json
```
**Tip — have the builder generate a schema for you**
* **I have `products.csv` with Product Title, Product Description, Category, and Sub Category. Help me enrich with SEO meta fields.**
* **I have `reviews.csv` with Title, Body, and Stars. Help me extract sentiment and generate a short summary.**
* **I have `address.csv` with an Address field. Help me extract City, State, and Country using ISO-standard names.**
* **I have `tickets.csv` with Subject and Description. Help me classify each ticket into predefined support categories.**
* **I have `posts.csv` with Title, Body, URL, Image URL, Brand, and Platform. Help me generate social media captions, hashtags, emojis, CTAs, and alt text.**
* **I have `jobs.csv` with Job Title and Description. Help me categorize jobs into sectors and identify the level of seniority.**


---

## CLI

```bash
csvai INPUT.csv [--prompt PROMPT_FILE] [--output OUTPUT_FILE]
                          [--limit N] [--model MODEL] [--schema SCHEMA_FILE]
                          [--process-image] [--image-col COL] [--image-root DIR]
```

**Flags**

* `--prompt, -p` — path to a plaintext prompt file (Jinja template).
* `--output, -o` — output CSV path (default: `<input>_enriched.csv`).
* `--limit` — process only the first `N` new/pending rows.
* `--model` — model name (default from `.env`, falls back to `gpt-4o-mini`).
* `--schema` — path to a JSON Schema for structured outputs (optional).
* `--process-image` — enable image analysis; when set, attaches an image per row if available.
* `--image-col` — name of the image column (default: `image`).
* `--image-root` — directory to resolve local image filenames (default: `./images`).

Notes on images:
- If the image cell is blank, the row is processed as text-only.
- If the cell is a full URL (`http(s)://...`), the model fetches it.
- Otherwise the value is treated as a filename: resolved as an absolute/relative path first, then `./images/<filename>`.
- If a referenced file is missing/unreadable, the tool logs a warning and proceeds text-only.

---

## Environment Variables (`.env`)

See [`example.env`](example.env) for all configurable variables.

```ini
OPENAI_API_KEY=sk-...
DEFAULT_MODEL=gpt-4o-mini
MAX_OUTPUT_TOKENS=600
TEMPERATURE=0.2
MAX_CONCURRENT_REQUESTS=12
PROCESSING_BATCH_SIZE=100
REQUEST_TIMEOUT=45
ALT_PROMPT_SUFFIX=.prompt.txt
OUTPUT_FILE_SUFFIX=_enriched.csv
```

---

## Input/Output Behavior

* **Input CSV**: the script reads all rows. If an `id` column exists, it’s used to resume. If not, rows are indexed `0..N-1` internally for this run.
* **Prompt rendering**: every row is sanitized so `{{ Raw Header }}` becomes `{{ Raw_Header }}`. You can also reference the raw values as `{{ raw["Raw Header"] }}` if needed.
* **Output CSV**: contains the original columns plus AI-generated fields. The **header is fixed** after the first successful batch; later rows are written with the same header order.
* **Resume**: rerunning skips rows whose `id` is already present in the output file.

---

## Image Analysis Example

Files in `examples/`:

- `image.csv` — demo rows with an image URL, a local filename, and a blank image.
- `image.prompt.txt` — prompt to produce a one-sentence `description`.
- `image.schema.json` — schema requiring the `description` field.

Local image (for row 2): place a file at `./images/sample.jpg` (relative to your current working directory). For convenience you can download a sample image, for example:

```bash
mkdir -p images
curl -L -o images/sample.jpg https://upload.wikimedia.org/wikipedia/commons/3/3f/JPEG_example_flower.jpg
```

Run the example (multimodal enabled):

```bash
csvai examples/image.csv \
  --prompt examples/image.prompt.txt \
  --schema examples/image.schema.json \
  --process-image
```

Notes:
- The image column defaults to `image`; override with `--image-col` if needed.
- Local filenames are resolved as-is first; if not found, `./images/<filename>` is tried.
- If an image is missing or unreadable, the row is processed as text-only and a warning is logged.


## Structured Outputs vs JSON Mode

### Structured Outputs (recommended)

When you pass `--schema`, the request includes:

```python
text={
  "format": {
    "type": "json_schema",
    "name": "row_schema",
    "schema": schema,
    "strict": true
  }
}
```

This guarantees the model returns **exactly** the keys/types you expect.

### JSON Mode (no schema)

When no schema is provided, the request includes:

```python
text={"format": {"type": "json_object"}}
```

The model must still return a single JSON object, but no exact schema is enforced.

> **Prompting tip:** mention the word **JSON** in your prompt and explicitly list the expected fields to improve compliance in JSON mode.

---

## Performance & Concurrency

* Concurrency is controlled by `MAX_CONCURRENT_REQUESTS`.
* Increase gradually; too high can trigger API rate limits.
* `PROCESSING_BATCH_SIZE` controls how many results are written per batch.
* `REQUEST_TIMEOUT` guards slow requests; the script retries with backoff.

---

## Troubleshooting

**“Missing required parameter: `text.format.name`”**
You used structured outputs but didn’t include `name` **alongside** `type` and `schema`. The script already sends this correctly; ensure you’re on the latest version and that `--schema` points to the right file.

**“Invalid schema … `required` must include every key”**
The Responses structured-outputs path expects `required` to include **all** keys in `properties`. Either (a) add them all to `required`, (b) remove non-required keys from `properties`, or (c) use JSON mode.

**Rows not resuming**
Ensure there’s an `id` column in both input and output. If not present, the script uses positional IDs for the current run only.

---

## FAQ

**Q: Does it run concurrently?**
Yes. Concurrency is controlled via `MAX_CONCURRENT_REQUESTS` (default 10).

**Q: Can I rely on an `id` column?**
Yes. If present in the input CSV, it’s used for resumability. Otherwise rows are indexed for the session.

**Q: Can I output nested JSON?**
The schema can be nested, but CSV is flat. If you want nested data, extend the script with a flattener (e.g., convert `address.street` → `address_street`).

**Q: Which models work?**
Recent `gpt-4o*` models support Responses + Structured Outputs. If a model doesn’t support it, use JSON mode.

**Q: Do I need a JSON schema?**
No, but it’s strongly recommended for stable columns and fewer parse failures.

---

## Support

This application was developed as an internal tool and we will continue to improve and optimize it as long as we use it. If you would like us to customize this or build a similar or related system to automate your tasks with AI, we are available for **commercial support**.

---

### About Zyxware Technologies

At **Zyxware Technologies**, our mission is to help organizations harness the power of technology to solve real-world problems. Guided by our founding values of honesty and fairness, we are committed to delivering genuine value to our clients and the free and open-source community.

**CSVAI** is a direct result of this philosophy. We originally developed it to automate and streamline our own internal data-enrichment tasks. Realizing its potential to help others, we are sharing it as a free tool in the spirit of our commitment to Free Software.

Our expertise is centered around our **AI & Automation Services**. We specialize in building intelligent solutions that reduce manual effort, streamline business operations, and unlock data-driven insights. While we provide powerful free tools like this one, we also offer **custom development and commercial support** for businesses that require tailored AI solutions.

If you're looking to automate a unique business process or build a similar system, we invite you to [**reach out to us**](https://www.zyxware.com/contact-us) to schedule a free discovery call.

---

## Updates

For updates and new versions, visit: [Project Page @ Zyxware](https://www.zyxware.com/article/6935/csvai-automate-data-enrichment-any-csv-or-excel-file-generative-ai)

---

## Contact

[https://www.zyxware.com/contact-us](https://www.zyxware.com/contact-us)

---

## Source Repository

[https://github.com/zyxware/csvai](https://github.com/zyxware/csvai)

---

## Reporting Issues

[https://github.com/zyxware/csvai/issues](https://github.com/zyxware/csvai/issues)

---

## License and Disclaimer

**GPL v2** – Free to use & modify. Use it at your own risk. We are not collecting any user data.

---

## Need Help or Commercial Support?

If you have any questions, feel free to [contact us](https://www.zyxware.com/contact-us).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "csvai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "csv, excel, ai, openai, data enrichment, llm, automation, vision, image analysis",
    "author": "Zyxware Technologies, Vimal Joseph",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/79/8c/57d13dcdfdd29ac508ecafbfc32876f392dc328231ee4e426efbf136248a/csvai-0.2.0.tar.gz",
    "platform": null,
    "description": "# CSVAI \u2014 Apply an AI prompt to each row in a CSV or Excel file and write enriched results\n\nThe `csvai` library reads an input CSV or Excel file, renders a prompt for each row (you can use raw column names like `{{ Address }}`), calls an **OpenAI model via the Responses API**, and writes the original columns plus AI-generated fields to an output CSV or Excel file. It also support **image analysis** (vision) when enabled.\n\nThe tool is **async + concurrent**, **resumable**, and **crash-safe**. It supports **Structured Outputs** with a **JSON Schema** for reliable JSON, or **JSON mode** (without a schema) if you prefer a lighter setup.\n\nWe also have a **CSV AI Prompt Builder** (a Custom GPT) to help you generate prompts and JSON Schemas tailored to your CSVs.\n\n---\n\n## Features\n\n* **Structured Outputs**: enforce exact JSON with a schema for consistent, validated results.\n* **JSON mode**: force a single JSON object without defining a schema.\n* **Async & concurrent**: process many rows in parallel for faster throughput.\n* **Resumable**: rows already written (by `id`) are skipped on re-run.\n* **CSV or Excel**: handle `.csv` and `.xlsx` inputs and outputs.\n* **image analysis**: add `--process-image` to attach an image per row (via URL or local file) to multimodal models like `gpt-4o-mini`.\n\n---\n\n## Installation\n\nRequires Python **3.9+**.\nOpenAI API Key: Create a key - https://platform.openai.com/api-keys and use it in the .env file with OPENAI_API_KEY=\nSee example.env in the [project repo](https://github.com/zyxware/csvai).\n\n### From PyPI\n\n```bash\npip install csvai\n# Include Streamlit UI dependencies\npip install \"csvai[ui]\"\n```\n\n### From GitHub\n\n```bash\n# Install directly from the repository\npip install git+https://github.com/zyxware/csvai.git\n# With Streamlit UI dependencies\npip install \"csvai[ui] @ git+https://github.com/zyxware/csvai.git\"\n```\n\n### For local development\n\n```bash\ngit clone https://github.com/zyxware/csvai\ncd csvai\npython -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\npip install -e .\ncp example.env .env\n# Edit .env and set OPENAI_API_KEY=sk-...\n```\n\nInstalling the package exposes the `csvai` CLI and the `csvai-ui` command.\n\n---\n\n## Usage\n\n### CLI\n\n#### Auto-discovery\n\nIf you name your files like this:\n\n```\ninput.csv      # or input.xlsx\ninput.prompt.txt\ninput.schema.json   # optional\n```\n\nRun:\n\n```bash\ncsvai input.csv      # or input.xlsx\n```\n\n#### Or specify prompt & schema explicitly\n\n```bash\n# With a prompt and a strict schema (best reliability)\ncsvai address.xlsx --prompt address.prompt.txt --schema address.schema.json\n\n# Or JSON mode (no schema; still a single JSON object)\ncsvai address.xlsx --prompt address.prompt.txt\n```\n\nSample datasets (`address.csv` and `address.xlsx`) with the matching prompt and schema live in the `example/` directory.\n\n### Streamlit UI\n\nAfter installing with the `ui` extra, launch the web interface:\n\n```bash\ncsvai-ui\n```\n\nThe UI lets you upload a CSV/Excel file, provide a prompt and optional schema.\nA \"Process images\" toggle is available to attach an image per row; you can set the image column (default `image`) and the image root directory (default `./images`).\n\n---\n\n## Example Prompt & Schema\n\n### Prompt (`address.prompt.txt`)\n\n```text\nExtract city, state, and country from the given address.\n\nRules:\n- city: city/town/locality (preserve accents, proper case)\n- state: ISO-standard name of the state/region/province or \"\" if none\n- country: ISO 3166 English short name of the country; infer if obvious, else \"\"\n- Ignore descriptors like \"(EU)\"\n- Do not guess street-level info\n\nInputs:\nAddress: {{Address}}\n```\n\n### Schema (`address.schema.json`)\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"city\": {\n      \"type\": \"string\",\n      \"description\": \"City, town, or locality name with correct casing and accents preserved\"\n    },\n    \"state\": {\n      \"type\": \"string\",\n      \"description\": \"ISO-standard name of the state, region, or province, or empty string if none\"\n    },\n    \"country\": {\n      \"type\": \"string\",\n      \"description\": \"ISO 3166 English short name of the country, inferred if obvious, else empty string\"\n    }\n  },\n  \"required\": [\"city\", \"state\", \"country\"],\n  \"additionalProperties\": false\n}\n```\n---\n\n## Creating Prompts & Schemas with the CSV AI Prompt Builder\n\nYou can use the **[CSV AI Prompt Builder](https://chat.openai.com/g/g-689d8067bd888191a896d2cfdab27a39-csv-ai-prompt-builder)** custom GPT to:\n\n* Quickly design a **prompt** tailored to your CSV data.\n* Generate a **JSON Schema** that matches your desired structured output.\n\n**Example input to the builder:**\n\n```\nFile: reviews.csv. Inputs: title,body,stars. Output: sentiment,summary.\n```\n\n**Example result:**\n\n**Prompt**\n\n```\nAnalyze each review and produce sentiment and a concise summary.\n\nRules:\n- sentiment: one of positive, neutral, negative.\n- Star mapping: stars \u2264 2 \u21d2 negative; 3 \u21d2 neutral; \u2265 4 \u21d2 positive. If stars is blank or invalid, infer from tone.\n- summary: 1\u20132 sentences, factual, include key pros/cons, no emojis, no first person, no marketing fluff.\n- Use the same language as the Body.\n- Return only the fields required by the tool schema.\n\nInputs:\nTitle: {{title}}\nBody: {{body}}\nStars: {{stars}}\n```\n\n**Schema**\n\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"sentiment\": {\n      \"type\": \"string\",\n      \"description\": \"Overall sentiment derived from stars and/or tone; one of positive, neutral, negative\",\n      \"enum\": [\"positive\", \"neutral\", \"negative\"]\n    },\n    \"summary\": {\n      \"type\": \"string\",\n      \"description\": \"Concise 1\u20132 sentence summary capturing key pros/cons without opinionated fluff\"\n    }\n  },\n  \"required\": [\"sentiment\", \"summary\"],\n  \"additionalProperties\": false\n}\n```\n\n**Command to execute**\n\n```bash\npython -m csvai.cli reviews.csv --prompt reviews.prompt.txt --schema reviews.schema.json\n```\n**Tip \u2014 have the builder generate a schema for you**\n* **I have `products.csv` with Product Title, Product Description, Category, and Sub Category. Help me enrich with SEO meta fields.**\n* **I have `reviews.csv` with Title, Body, and Stars. Help me extract sentiment and generate a short summary.**\n* **I have `address.csv` with an Address field. Help me extract City, State, and Country using ISO-standard names.**\n* **I have `tickets.csv` with Subject and Description. Help me classify each ticket into predefined support categories.**\n* **I have `posts.csv` with Title, Body, URL, Image URL, Brand, and Platform. Help me generate social media captions, hashtags, emojis, CTAs, and alt text.**\n* **I have `jobs.csv` with Job Title and Description. Help me categorize jobs into sectors and identify the level of seniority.**\n\n\n---\n\n## CLI\n\n```bash\ncsvai INPUT.csv [--prompt PROMPT_FILE] [--output OUTPUT_FILE]\n                          [--limit N] [--model MODEL] [--schema SCHEMA_FILE]\n                          [--process-image] [--image-col COL] [--image-root DIR]\n```\n\n**Flags**\n\n* `--prompt, -p` \u2014 path to a plaintext prompt file (Jinja template).\n* `--output, -o` \u2014 output CSV path (default: `<input>_enriched.csv`).\n* `--limit` \u2014 process only the first `N` new/pending rows.\n* `--model` \u2014 model name (default from `.env`, falls back to `gpt-4o-mini`).\n* `--schema` \u2014 path to a JSON Schema for structured outputs (optional).\n* `--process-image` \u2014 enable image analysis; when set, attaches an image per row if available.\n* `--image-col` \u2014 name of the image column (default: `image`).\n* `--image-root` \u2014 directory to resolve local image filenames (default: `./images`).\n\nNotes on images:\n- If the image cell is blank, the row is processed as text-only.\n- If the cell is a full URL (`http(s)://...`), the model fetches it.\n- Otherwise the value is treated as a filename: resolved as an absolute/relative path first, then `./images/<filename>`.\n- If a referenced file is missing/unreadable, the tool logs a warning and proceeds text-only.\n\n---\n\n## Environment Variables (`.env`)\n\nSee [`example.env`](example.env) for all configurable variables.\n\n```ini\nOPENAI_API_KEY=sk-...\nDEFAULT_MODEL=gpt-4o-mini\nMAX_OUTPUT_TOKENS=600\nTEMPERATURE=0.2\nMAX_CONCURRENT_REQUESTS=12\nPROCESSING_BATCH_SIZE=100\nREQUEST_TIMEOUT=45\nALT_PROMPT_SUFFIX=.prompt.txt\nOUTPUT_FILE_SUFFIX=_enriched.csv\n```\n\n---\n\n## Input/Output Behavior\n\n* **Input CSV**: the script reads all rows. If an `id` column exists, it\u2019s used to resume. If not, rows are indexed `0..N-1` internally for this run.\n* **Prompt rendering**: every row is sanitized so `{{ Raw Header }}` becomes `{{ Raw_Header }}`. You can also reference the raw values as `{{ raw[\"Raw Header\"] }}` if needed.\n* **Output CSV**: contains the original columns plus AI-generated fields. The **header is fixed** after the first successful batch; later rows are written with the same header order.\n* **Resume**: rerunning skips rows whose `id` is already present in the output file.\n\n---\n\n## Image Analysis Example\n\nFiles in `examples/`:\n\n- `image.csv` \u2014 demo rows with an image URL, a local filename, and a blank image.\n- `image.prompt.txt` \u2014 prompt to produce a one-sentence `description`.\n- `image.schema.json` \u2014 schema requiring the `description` field.\n\nLocal image (for row 2): place a file at `./images/sample.jpg` (relative to your current working directory). For convenience you can download a sample image, for example:\n\n```bash\nmkdir -p images\ncurl -L -o images/sample.jpg https://upload.wikimedia.org/wikipedia/commons/3/3f/JPEG_example_flower.jpg\n```\n\nRun the example (multimodal enabled):\n\n```bash\ncsvai examples/image.csv \\\n  --prompt examples/image.prompt.txt \\\n  --schema examples/image.schema.json \\\n  --process-image\n```\n\nNotes:\n- The image column defaults to `image`; override with `--image-col` if needed.\n- Local filenames are resolved as-is first; if not found, `./images/<filename>` is tried.\n- If an image is missing or unreadable, the row is processed as text-only and a warning is logged.\n\n\n## Structured Outputs vs JSON Mode\n\n### Structured Outputs (recommended)\n\nWhen you pass `--schema`, the request includes:\n\n```python\ntext={\n  \"format\": {\n    \"type\": \"json_schema\",\n    \"name\": \"row_schema\",\n    \"schema\": schema,\n    \"strict\": true\n  }\n}\n```\n\nThis guarantees the model returns **exactly** the keys/types you expect.\n\n### JSON Mode (no schema)\n\nWhen no schema is provided, the request includes:\n\n```python\ntext={\"format\": {\"type\": \"json_object\"}}\n```\n\nThe model must still return a single JSON object, but no exact schema is enforced.\n\n> **Prompting tip:** mention the word **JSON** in your prompt and explicitly list the expected fields to improve compliance in JSON mode.\n\n---\n\n## Performance & Concurrency\n\n* Concurrency is controlled by `MAX_CONCURRENT_REQUESTS`.\n* Increase gradually; too high can trigger API rate limits.\n* `PROCESSING_BATCH_SIZE` controls how many results are written per batch.\n* `REQUEST_TIMEOUT` guards slow requests; the script retries with backoff.\n\n---\n\n## Troubleshooting\n\n**\u201cMissing required parameter: `text.format.name`\u201d**\nYou used structured outputs but didn\u2019t include `name` **alongside** `type` and `schema`. The script already sends this correctly; ensure you\u2019re on the latest version and that `--schema` points to the right file.\n\n**\u201cInvalid schema \u2026 `required` must include every key\u201d**\nThe Responses structured-outputs path expects `required` to include **all** keys in `properties`. Either (a) add them all to `required`, (b) remove non-required keys from `properties`, or (c) use JSON mode.\n\n**Rows not resuming**\nEnsure there\u2019s an `id` column in both input and output. If not present, the script uses positional IDs for the current run only.\n\n---\n\n## FAQ\n\n**Q: Does it run concurrently?**\nYes. Concurrency is controlled via `MAX_CONCURRENT_REQUESTS` (default 10).\n\n**Q: Can I rely on an `id` column?**\nYes. If present in the input CSV, it\u2019s used for resumability. Otherwise rows are indexed for the session.\n\n**Q: Can I output nested JSON?**\nThe schema can be nested, but CSV is flat. If you want nested data, extend the script with a flattener (e.g., convert `address.street` \u2192 `address_street`).\n\n**Q: Which models work?**\nRecent `gpt-4o*` models support Responses + Structured Outputs. If a model doesn\u2019t support it, use JSON mode.\n\n**Q: Do I need a JSON schema?**\nNo, but it\u2019s strongly recommended for stable columns and fewer parse failures.\n\n---\n\n## Support\n\nThis application was developed as an internal tool and we will continue to improve and optimize it as long as we use it. If you would like us to customize this or build a similar or related system to automate your tasks with AI, we are available for **commercial support**.\n\n---\n\n### About Zyxware Technologies\n\nAt **Zyxware Technologies**, our mission is to help organizations harness the power of technology to solve real-world problems. Guided by our founding values of honesty and fairness, we are committed to delivering genuine value to our clients and the free and open-source community.\n\n**CSVAI** is a direct result of this philosophy. We originally developed it to automate and streamline our own internal data-enrichment tasks. Realizing its potential to help others, we are sharing it as a free tool in the spirit of our commitment to Free Software.\n\nOur expertise is centered around our **AI & Automation Services**. We specialize in building intelligent solutions that reduce manual effort, streamline business operations, and unlock data-driven insights. While we provide powerful free tools like this one, we also offer **custom development and commercial support** for businesses that require tailored AI solutions.\n\nIf you're looking to automate a unique business process or build a similar system, we invite you to [**reach out to us**](https://www.zyxware.com/contact-us) to schedule a free discovery call.\n\n---\n\n## Updates\n\nFor updates and new versions, visit: [Project Page @ Zyxware](https://www.zyxware.com/article/6935/csvai-automate-data-enrichment-any-csv-or-excel-file-generative-ai)\n\n---\n\n## Contact\n\n[https://www.zyxware.com/contact-us](https://www.zyxware.com/contact-us)\n\n---\n\n## Source Repository\n\n[https://github.com/zyxware/csvai](https://github.com/zyxware/csvai)\n\n---\n\n## Reporting Issues\n\n[https://github.com/zyxware/csvai/issues](https://github.com/zyxware/csvai/issues)\n\n---\n\n## License and Disclaimer\n\n**GPL v2** \u2013 Free to use & modify. Use it at your own risk. We are not collecting any user data.\n\n---\n\n## Need Help or Commercial Support?\n\nIf you have any questions, feel free to [contact us](https://www.zyxware.com/contact-us).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Enrich CSV or Excel rows using OpenAI models (text or image analysis).",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/zyxware/csvai/issues",
        "Contact": "https://www.zyxware.com/contact-us",
        "Homepage": "https://www.zyxware.com/article/6935/csvai-automate-data-enrichment-any-csv-or-excel-file-generative-ai",
        "Source": "https://github.com/zyxware/csvai"
    },
    "split_keywords": [
        "csv",
        " excel",
        " ai",
        " openai",
        " data enrichment",
        " llm",
        " automation",
        " vision",
        " image analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "59d6b1a9e421f9e1538e565894cb5670b8fbb65ee5e62a660de63019e196677e",
                "md5": "9b4a58ed7eb140682d758f44cebb6c14",
                "sha256": "5d716660ed8029b5a77a9214cb2eeef1fea9df48a9ae48f92fd0b223c024ea62"
            },
            "downloads": -1,
            "filename": "csvai-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b4a58ed7eb140682d758f44cebb6c14",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 26904,
            "upload_time": "2025-09-10T17:17:18",
            "upload_time_iso_8601": "2025-09-10T17:17:18.270733Z",
            "url": "https://files.pythonhosted.org/packages/59/d6/b1a9e421f9e1538e565894cb5670b8fbb65ee5e62a660de63019e196677e/csvai-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "798c57d13dcdfdd29ac508ecafbfc32876f392dc328231ee4e426efbf136248a",
                "md5": "448ada6cbdbc6232dabe5a6dae312bcc",
                "sha256": "3d87818473d69253d62e9313ca143068e16a73fe4e6bae4d6a69e92f4bf05741"
            },
            "downloads": -1,
            "filename": "csvai-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "448ada6cbdbc6232dabe5a6dae312bcc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 30438,
            "upload_time": "2025-09-10T17:17:19",
            "upload_time_iso_8601": "2025-09-10T17:17:19.630618Z",
            "url": "https://files.pythonhosted.org/packages/79/8c/57d13dcdfdd29ac508ecafbfc32876f392dc328231ee4e426efbf136248a/csvai-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-10 17:17:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zyxware",
    "github_project": "csvai",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.99.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    ">=",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "streamlit",
            "specs": []
        }
    ],
    "lcname": "csvai"
}
        
Elapsed time: 1.65080s