| Name | datatalk-cli JSON |
| Version |
0.1.1
JSON |
| download |
| home_page | None |
| Summary | Query CSV and Parquet data with natural language |
| upload_time | 2025-09-01 23:38:55 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.9 |
| license | MIT |
| keywords |
ai
analysis
csv
data
parquet
sql
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Datatalk
> Query CSV and Parquet data with natural language.
## The Problem
Large Language Models are incredibly powerful at understanding natural language, but they're terrible at crunching numbers. When you ask an LLM to analyze data, it often hallucinates statistics, makes calculation errors, or gets overwhelmed by large datasets. You end up with unreliable results and burning through expensive tokens.
## How It Works
Datatalk solves these problems by combining the best of both worlds: AI's natural language understanding with local computation's precision and efficiency. Instead of asking an LLM to crunch numbers, Datatalk uses AI only to interpret your questions while doing all calculations locally on your machine.
The tool works by:
1. **Loading your data** - Supports both CSV and Parquet formats
2. **Understanding your question** - Uses AI to interpret natural language queries
3. **Analyzing the data** - Automatically processes your data to find the answers
4. **Returning results** - Provides clear, formatted answers with the underlying data
Whether you're analyzing marketing campaigns, exploring user behavior, or investigating trends, Datatalk lets you focus on asking the right questions rather than figuring out how to code the answers.
## Why Choose Datatalk
**Privacy & Data Security**: Your raw data never leaves your machine. Only column names and data types are shared with the AI to understand your query structure.
**Deterministic Calculations**: Unlike direct LLM analysis, numerical computations produce consistent, reproducible results. This eliminates the common problem of AI hallucinating statistics or making arithmetic errors when working with data.
**Token Efficiency**: Large datasets can quickly exhaust LLM token limits and become expensive to process. Datatalk sends only schema metadata and query context to the AI, allowing you to analyze gigabyte-sized files for the cost of a few hundred tokens rather than thousands.
**No Code Required**: Traditional data analysis requires knowledge of pandas, SQL, or similar tools. Datatalk translates natural language questions into the appropriate data operations, making exploratory data analysis accessible without programming expertise.
**Multiple Format Support**: Works with both CSV and Parquet files out of the box. Parquet support is particularly useful for large datasets as it provides better compression and faster read times compared to CSV.
## Getting Started
### PyPI Installation
```bash
# Using pip
pip install datatalk
# Using uv
uv add datatalk
```
### Homebrew Installation
```bash
# Add the tap
brew tap tsaplin/datatalk
# Install datatalk
brew install datatalk
```
## Setting Up Your Environment
Datatalk requires Azure OpenAI credentials. You can configure them in two ways:
### Option 1: Interactive Configuration (Recommended)
If no `.env` file is found, the tool will prompt you for the two required values and save them to `~/.config/datatalk/config.json`.
### Option 2: Environment File
Create a `.env` file with just these two lines:
```bash
AZURE_DEPLOYMENT_TARGET_URL=https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview
AZURE_OPENAI_API_KEY=your-api-key-here
```
**That's it!** The tool automatically extracts the endpoint, deployment name, and API version from the target URL.
To view your current configuration:
```bash
datatalk --config-info
```
## Usage Examples with Sample Data
This repository includes sample CSV files in the `sample_data/` folder to help you get started quickly:
```bash
# Analyze sales data interactively
uv run datatalk sample_data/sales_data.csv
```
```bash
# Analyze employee data interactively
uv run datatalk sample_data/employees.csv
```
```bash
# Analyze inventory data interactively
uv run datatalk sample_data/inventory.csv
```
```bash
# Analyze customer data interactively
uv run datatalk sample_data/customers.csv
```
## Development
### Development Installation
```bash
# Clone the repository
git clone https://github.com/tsaplin/datatalk.git
cd datatalk
# Run directly
uv run datatalk --help
```
### Testing
Install test dependencies:
```bash
uv sync --extra test
```
Run all tests:
```bash
uv run pytest
```
Raw data
{
"_id": null,
"home_page": null,
"name": "datatalk-cli",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "ai, analysis, csv, data, parquet, sql",
"author": null,
"author_email": "Vitaly Tsaplin <vitaly@tsaplin.com>",
"download_url": "https://files.pythonhosted.org/packages/80/35/8ed450efb8de1db5dc4627f8ac0c0cf5a3fe1af4a56fba2fa00a78bf16d8/datatalk_cli-0.1.1.tar.gz",
"platform": null,
"description": "# Datatalk\n\n> Query CSV and Parquet data with natural language.\n\n## The Problem\n\nLarge Language Models are incredibly powerful at understanding natural language, but they're terrible at crunching numbers. When you ask an LLM to analyze data, it often hallucinates statistics, makes calculation errors, or gets overwhelmed by large datasets. You end up with unreliable results and burning through expensive tokens.\n\n## How It Works\n\nDatatalk solves these problems by combining the best of both worlds: AI's natural language understanding with local computation's precision and efficiency. Instead of asking an LLM to crunch numbers, Datatalk uses AI only to interpret your questions while doing all calculations locally on your machine.\n\nThe tool works by:\n\n1. **Loading your data** - Supports both CSV and Parquet formats\n2. **Understanding your question** - Uses AI to interpret natural language queries\n3. **Analyzing the data** - Automatically processes your data to find the answers\n4. **Returning results** - Provides clear, formatted answers with the underlying data\n\nWhether you're analyzing marketing campaigns, exploring user behavior, or investigating trends, Datatalk lets you focus on asking the right questions rather than figuring out how to code the answers.\n\n## Why Choose Datatalk\n\n**Privacy & Data Security**: Your raw data never leaves your machine. Only column names and data types are shared with the AI to understand your query structure.\n\n**Deterministic Calculations**: Unlike direct LLM analysis, numerical computations produce consistent, reproducible results. This eliminates the common problem of AI hallucinating statistics or making arithmetic errors when working with data.\n\n**Token Efficiency**: Large datasets can quickly exhaust LLM token limits and become expensive to process. Datatalk sends only schema metadata and query context to the AI, allowing you to analyze gigabyte-sized files for the cost of a few hundred tokens rather than thousands.\n\n**No Code Required**: Traditional data analysis requires knowledge of pandas, SQL, or similar tools. Datatalk translates natural language questions into the appropriate data operations, making exploratory data analysis accessible without programming expertise.\n\n**Multiple Format Support**: Works with both CSV and Parquet files out of the box. Parquet support is particularly useful for large datasets as it provides better compression and faster read times compared to CSV.\n\n## Getting Started\n\n### PyPI Installation\n\n```bash\n# Using pip\npip install datatalk\n\n# Using uv\nuv add datatalk\n```\n\n### Homebrew Installation\n\n```bash\n# Add the tap\nbrew tap tsaplin/datatalk\n\n# Install datatalk\nbrew install datatalk\n```\n\n## Setting Up Your Environment\n\nDatatalk requires Azure OpenAI credentials. You can configure them in two ways:\n\n### Option 1: Interactive Configuration (Recommended)\n\nIf no `.env` file is found, the tool will prompt you for the two required values and save them to `~/.config/datatalk/config.json`.\n\n### Option 2: Environment File\n\nCreate a `.env` file with just these two lines:\n\n```bash\nAZURE_DEPLOYMENT_TARGET_URL=https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview\nAZURE_OPENAI_API_KEY=your-api-key-here\n```\n\n**That's it!** The tool automatically extracts the endpoint, deployment name, and API version from the target URL.\n\nTo view your current configuration:\n\n```bash\ndatatalk --config-info\n```\n\n## Usage Examples with Sample Data\n\nThis repository includes sample CSV files in the `sample_data/` folder to help you get started quickly:\n\n```bash\n# Analyze sales data interactively\nuv run datatalk sample_data/sales_data.csv\n```\n\n```bash\n# Analyze employee data interactively\nuv run datatalk sample_data/employees.csv\n```\n\n```bash\n# Analyze inventory data interactively\nuv run datatalk sample_data/inventory.csv\n```\n\n```bash\n# Analyze customer data interactively\nuv run datatalk sample_data/customers.csv\n```\n\n## Development\n\n### Development Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/tsaplin/datatalk.git\ncd datatalk\n\n# Run directly\nuv run datatalk --help\n```\n\n### Testing\n\nInstall test dependencies:\n\n```bash\nuv sync --extra test\n```\n\nRun all tests:\n\n```bash\nuv run pytest\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Query CSV and Parquet data with natural language",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/vtsaplin/datatalk",
"Issues": "https://github.com/vtsaplin/datatalk/issues",
"Repository": "https://github.com/vtsaplin/datatalk"
},
"split_keywords": [
"ai",
" analysis",
" csv",
" data",
" parquet",
" sql"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1dbd90144da10f6ae7bf3d205399a9b7be6e41179c4be2d8e11eaac1654f6a02",
"md5": "db1de9a6b409ca7493cc16071ee4113e",
"sha256": "fb5705d6f4c2bbce123b7a03da99d9adb7219d894b5bd04a93a176a30faa6d55"
},
"downloads": -1,
"filename": "datatalk_cli-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "db1de9a6b409ca7493cc16071ee4113e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 13551,
"upload_time": "2025-09-01T23:38:53",
"upload_time_iso_8601": "2025-09-01T23:38:53.805297Z",
"url": "https://files.pythonhosted.org/packages/1d/bd/90144da10f6ae7bf3d205399a9b7be6e41179c4be2d8e11eaac1654f6a02/datatalk_cli-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "80358ed450efb8de1db5dc4627f8ac0c0cf5a3fe1af4a56fba2fa00a78bf16d8",
"md5": "88b08b78cc748d2091f4edcecf7536c2",
"sha256": "22b8ceacdc8fc20f1eb9c3561ffdbc4872b2f40660866af35cf5a978e5377a9f"
},
"downloads": -1,
"filename": "datatalk_cli-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "88b08b78cc748d2091f4edcecf7536c2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 100655,
"upload_time": "2025-09-01T23:38:55",
"upload_time_iso_8601": "2025-09-01T23:38:55.024002Z",
"url": "https://files.pythonhosted.org/packages/80/35/8ed450efb8de1db5dc4627f8ac0c0cf5a3fe1af4a56fba2fa00a78bf16d8/datatalk_cli-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 23:38:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vtsaplin",
"github_project": "datatalk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "datatalk-cli"
}