# 🌟 Orion - AI-Powered Data Analysis Agent
[](https://pypi.org/project/orion-data-analyst/)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://langchain-ai.github.io/langgraph/)
[](https://www.langchain.com/)
[](https://cloud.google.com/bigquery)
[](https://ai.google.dev/)
[](https://github.com/gavrielhan/orion-data-analyst)
An intelligent data analysis agent that transforms natural language questions into SQL queries, executes them on BigQuery, performs statistical analysis, and generates actionable business insights.
🔗 **GitHub**: https://github.com/gavrielhan/orion-data-analyst
📦 **PyPI**: https://pypi.org/project/orion-data-analyst/
---
## ✨ What is Orion?

Orion is your AI business analyst that:
- **Understands natural language** - Ask questions in plain English
- **Generates smart SQL** - Powered by Google Gemini AI
- **Analyzes data automatically** - Statistical analysis, trends, segmentation
- **Provides insights** - Actionable recommendations with business context
- **Creates visualizations** - Charts saved automatically
- **Self-heals errors** - Automatically fixes and retries failed queries
- **Remembers conversations** - Handles follow-up questions with context
Built with **LangGraph** for modular AI reasoning and **Google BigQuery** for data warehousing.
---
## 🚀 Quick Start
### Installation
**Option 1: Install from PyPI (Recommended)**
```bash
pip install orion-data-analyst
```
**Option 2: Install from Source**
```bash
git clone https://github.com/gavrielhan/orion-data-analyst.git
cd orion-data-analyst
pip install -e .
```
### Setup
1. **Get API Keys** (see [GETTING_KEYS.md](GETTING_KEYS.md)):
- Google Cloud Project ID
- Google Cloud service account JSON key
- Gemini API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
2. **Configure `.env` file**:
```bash
# Copy example
cp .env.example .env
# Edit with your credentials
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
GEMINI_API_KEY=your-gemini-api-key
```
3. **Run Orion**:
```bash
orion
```
---
## 💡 Usage Examples
### Basic Queries
```
You: show me top 10 products by revenue
Orion: [Generates SQL, executes, analyzes, and displays ranked results]
You: what are the sales trends for the last 6 months?
Orion: [Creates time-series analysis with month-over-month growth]
You: segment customers by purchase behavior
Orion: [Performs customer segmentation and analysis]
```
### Follow-up Questions
```
You: show top customers
Orion: [Displays ranked customer list]
You: show the same for the last quarter
Orion: [Uses conversation context to apply date filter]
You: break that down by region
Orion: [Further segments the previous results]
```
### Visualizations & Exports
```
You: create a bar chart of sales by category
Orion: [Generates chart and saves to ~/orion_results/]
You: save this as csv
Orion: [Exports results to ~/orion_results/results_TIMESTAMP.csv]
```
### Meta-Questions (Instant Responses)
```
You: what can you do?
Orion: [Explains capabilities without querying database]
You: which datasets can you query?
Orion: [Lists available tables and schemas]
```
---
## 🏗️ Architecture
Orion uses a **modular node-based architecture** powered by LangGraph:
### High-Level Architecture

### Detailed Graph Schema

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed component descriptions.
---
## 🎯 Key Features
### 🤖 Intelligent SQL Generation
- Natural language to SQL using Google Gemini
- Automatic schema context injection
- Self-healing with error feedback loops (max 3 retries)
- Handles complex JOINs across multiple tables
### 🛡️ Safety & Validation
- Blocks malicious queries (DROP, DELETE, etc.)
- BigQuery cost estimation before execution
- Query syntax validation with dry-run
- Row limits to prevent runaway queries
- Human-in-the-loop approval for expensive operations
### 📊 Advanced Analytics
- **Ranking**: Top N analysis with contribution %
- **Trends**: Time-series with growth rates
- **Segmentation**: Group-by analysis
- **RFM Analysis**: Customer segmentation (Recency, Frequency, Monetary)
- **Anomaly Detection**: Outlier identification
- **Comparative Analysis**: Period-over-period comparison
### 💬 Conversation Memory
- Remembers last 5 interactions
- Context-aware follow-up questions
- Session save/load for long conversations
- Automatic context pruning for token efficiency
### 📈 Visualizations
- **Chart Types**: Bar, Line, Pie, Scatter, Box, Candle
- Auto-saved to `~/orion_results/` (configurable)
- Smart chart type selection based on data
- CSV export for further analysis
### ⚡ Performance Optimizations
- **Query Caching**: Instant responses for repeated queries (1-hour TTL)
- **Schema Caching**: Reduces API calls to BigQuery metadata
- **Rate Limiting**: Token bucket algorithm for Gemini API
- **Streaming**: Large result handling
### 🎨 Polished UX
- Colored terminal output with formatted text
- Progress indicators at each step
- Helpful error messages with action links
- Startup validation with setup guidance
---
## 🗂️ Project Structure
```
orion-data-analyst/
├── assets/ # Images and diagrams
│ ├── orion_face.png # Main interface screenshot
│ ├── high_level_schema.png # High-level architecture diagram
│ └── graph_schema.png # Detailed graph flow diagram
├── src/
│ ├── __init__.py
│ ├── cli.py # CLI interface with session management
│ ├── config.py # Configuration loader (.env)
│ ├── agent/
│ │ ├── __init__.py
│ │ ├── graph.py # LangGraph workflow orchestration
│ │ ├── nodes.py # All 10 agent nodes
│ │ └── state.py # Centralized AgentState (TypedDict)
│ └── utils/
│ ├── __init__.py
│ ├── cache.py # Query result caching
│ ├── formatter.py # ANSI terminal formatting
│ ├── rate_limiter.py # API rate limiting
│ ├── schema_fetcher.py # BigQuery schema utilities
│ └── visualizer.py # Chart generation (matplotlib/seaborn)
├── tests/ # Test suite
│ ├── __init__.py
│ ├── test_nodes.py # Node unit tests
│ └── test_graph.py # Graph integration tests
├── .env.example # Configuration template
├── requirements.txt # Dependencies
├── setup.py # PyPI packaging
├── pyproject.toml # Modern Python packaging
├── install.sh # One-line installer
├── ARCHITECTURE.md # Detailed architecture docs
├── GETTING_KEYS.md # API key setup guide
└── README.md # This file
```
---
## ⚙️ Configuration
All configuration via `.env` file:
```bash
# REQUIRED
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
GEMINI_API_KEY=your-gemini-api-key
# OPTIONAL
GEMINI_MODEL=gemini-2.0-flash-exp # Choose Gemini model
ORION_OUTPUT_DIR=~/orion_results # Results directory
BIGQUERY_DATASET=bigquery-public-data.thelook_ecommerce
MAX_QUERY_ROWS=10000 # Row limit
QUERY_TIMEOUT=300 # Timeout (seconds)
```
---
## 📊 Dataset
Uses Google BigQuery's public e-commerce dataset:
- **Dataset**: `bigquery-public-data.thelook_ecommerce`
- **Tables**: `orders`, `order_items`, `products`, `users`
- **Schema**: Automatically loaded with column descriptions
---
## 🔧 Development
### Run from Source
```bash
git clone https://github.com/gavrielhan/orion-data-analyst.git
cd orion-data-analyst
pip install -e .
orion
```
---
## 📝 Commands
In the Orion CLI:
- `exit` / `quit` / `q` - Exit Orion
- `save session` - Save conversation history
- `load session [path]` - Load previous session
- `clear cache` - Clear query cache
---
## 🛠️ Technology Stack
| Component | Technology |
|-----------|-----------|
| **AI Orchestration** | LangGraph |
| **LLM Integration** | LangChain |
| **AI Model** | Google Gemini 2.0 Flash |
| **Data Warehouse** | Google BigQuery |
| **Data Processing** | pandas |
| **Visualization** | matplotlib, seaborn |
| **State Management** | TypedDict (Python) |
| **Configuration** | python-dotenv |
| **Packaging** | setuptools, PyPI |
---
## 📜 License
MIT License - see [LICENSE](LICENSE) file for details.
---
## 🤝 Contributing
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
---
## 🙏 Acknowledgments
- Built with [LangGraph](https://github.com/langchain-ai/langgraph) by LangChain
- Powered by [Google Gemini](https://ai.google.dev/)
- Data from [BigQuery Public Datasets](https://cloud.google.com/bigquery/public-data)
---
**Made with ❤️ by Gavriel Hannuna**
Raw data
{
"_id": null,
"home_page": "https://github.com/gavrielhan/orion-data-analyst",
"name": "orion-data-analyst",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "data-analysis, bigquery, ai, nlp, sql, gemini, analytics",
"author": "Gavriel Hannuna",
"author_email": "Gavriel Hannuna <gavriel.hannuna@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ea/87/5c55d8f2acfb84f57c72be7f70c37c1457b0da41c66f29bd504a28ccfd3f/orion_data_analyst-1.1.2.tar.gz",
"platform": null,
"description": "# \ud83c\udf1f Orion - AI-Powered Data Analysis Agent\n\n[](https://pypi.org/project/orion-data-analyst/)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://langchain-ai.github.io/langgraph/)\n[](https://www.langchain.com/)\n[](https://cloud.google.com/bigquery)\n[](https://ai.google.dev/)\n[](https://github.com/gavrielhan/orion-data-analyst)\n\nAn intelligent data analysis agent that transforms natural language questions into SQL queries, executes them on BigQuery, performs statistical analysis, and generates actionable business insights.\n\n\ud83d\udd17 **GitHub**: https://github.com/gavrielhan/orion-data-analyst \n\ud83d\udce6 **PyPI**: https://pypi.org/project/orion-data-analyst/\n\n---\n\n## \u2728 What is Orion?\n\n\n\nOrion is your AI business analyst that:\n- **Understands natural language** - Ask questions in plain English\n- **Generates smart SQL** - Powered by Google Gemini AI\n- **Analyzes data automatically** - Statistical analysis, trends, segmentation\n- **Provides insights** - Actionable recommendations with business context\n- **Creates visualizations** - Charts saved automatically\n- **Self-heals errors** - Automatically fixes and retries failed queries\n- **Remembers conversations** - Handles follow-up questions with context\n\nBuilt with **LangGraph** for modular AI reasoning and **Google BigQuery** for data warehousing.\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n**Option 1: Install from PyPI (Recommended)**\n```bash\npip install orion-data-analyst\n```\n\n**Option 2: Install from Source**\n```bash\ngit clone https://github.com/gavrielhan/orion-data-analyst.git\ncd orion-data-analyst\npip install -e .\n```\n\n### Setup\n\n1. **Get API Keys** (see [GETTING_KEYS.md](GETTING_KEYS.md)):\n - Google Cloud Project ID\n - Google Cloud service account JSON key\n - Gemini API key from [Google AI Studio](https://makersuite.google.com/app/apikey)\n\n2. **Configure `.env` file**:\n```bash\n# Copy example\ncp .env.example .env\n\n# Edit with your credentials\nGOOGLE_CLOUD_PROJECT=your-project-id\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\nGEMINI_API_KEY=your-gemini-api-key\n```\n\n3. **Run Orion**:\n```bash\norion\n```\n\n---\n\n## \ud83d\udca1 Usage Examples\n\n### Basic Queries\n```\nYou: show me top 10 products by revenue\nOrion: [Generates SQL, executes, analyzes, and displays ranked results]\n\nYou: what are the sales trends for the last 6 months?\nOrion: [Creates time-series analysis with month-over-month growth]\n\nYou: segment customers by purchase behavior\nOrion: [Performs customer segmentation and analysis]\n```\n\n### Follow-up Questions\n```\nYou: show top customers\nOrion: [Displays ranked customer list]\n\nYou: show the same for the last quarter\nOrion: [Uses conversation context to apply date filter]\n\nYou: break that down by region\nOrion: [Further segments the previous results]\n```\n\n### Visualizations & Exports\n```\nYou: create a bar chart of sales by category\nOrion: [Generates chart and saves to ~/orion_results/]\n\nYou: save this as csv\nOrion: [Exports results to ~/orion_results/results_TIMESTAMP.csv]\n```\n\n### Meta-Questions (Instant Responses)\n```\nYou: what can you do?\nOrion: [Explains capabilities without querying database]\n\nYou: which datasets can you query?\nOrion: [Lists available tables and schemas]\n```\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture\n\nOrion uses a **modular node-based architecture** powered by LangGraph:\n\n### High-Level Architecture\n\n\n\n### Detailed Graph Schema\n\n\n\n\nSee [ARCHITECTURE.md](ARCHITECTURE.md) for detailed component descriptions.\n\n---\n\n## \ud83c\udfaf Key Features\n\n### \ud83e\udd16 Intelligent SQL Generation\n- Natural language to SQL using Google Gemini\n- Automatic schema context injection\n- Self-healing with error feedback loops (max 3 retries)\n- Handles complex JOINs across multiple tables\n\n### \ud83d\udee1\ufe0f Safety & Validation\n- Blocks malicious queries (DROP, DELETE, etc.)\n- BigQuery cost estimation before execution\n- Query syntax validation with dry-run\n- Row limits to prevent runaway queries\n- Human-in-the-loop approval for expensive operations\n\n### \ud83d\udcca Advanced Analytics\n- **Ranking**: Top N analysis with contribution %\n- **Trends**: Time-series with growth rates\n- **Segmentation**: Group-by analysis\n- **RFM Analysis**: Customer segmentation (Recency, Frequency, Monetary)\n- **Anomaly Detection**: Outlier identification\n- **Comparative Analysis**: Period-over-period comparison\n\n### \ud83d\udcac Conversation Memory\n- Remembers last 5 interactions\n- Context-aware follow-up questions\n- Session save/load for long conversations\n- Automatic context pruning for token efficiency\n\n### \ud83d\udcc8 Visualizations\n- **Chart Types**: Bar, Line, Pie, Scatter, Box, Candle\n- Auto-saved to `~/orion_results/` (configurable)\n- Smart chart type selection based on data\n- CSV export for further analysis\n\n### \u26a1 Performance Optimizations\n- **Query Caching**: Instant responses for repeated queries (1-hour TTL)\n- **Schema Caching**: Reduces API calls to BigQuery metadata\n- **Rate Limiting**: Token bucket algorithm for Gemini API\n- **Streaming**: Large result handling\n\n### \ud83c\udfa8 Polished UX\n- Colored terminal output with formatted text\n- Progress indicators at each step\n- Helpful error messages with action links\n- Startup validation with setup guidance\n\n---\n\n## \ud83d\uddc2\ufe0f Project Structure\n\n```\norion-data-analyst/\n\u251c\u2500\u2500 assets/ # Images and diagrams\n\u2502 \u251c\u2500\u2500 orion_face.png # Main interface screenshot\n\u2502 \u251c\u2500\u2500 high_level_schema.png # High-level architecture diagram\n\u2502 \u2514\u2500\u2500 graph_schema.png # Detailed graph flow diagram\n\u251c\u2500\u2500 src/\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 cli.py # CLI interface with session management\n\u2502 \u251c\u2500\u2500 config.py # Configuration loader (.env)\n\u2502 \u251c\u2500\u2500 agent/\n\u2502 \u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u2502 \u251c\u2500\u2500 graph.py # LangGraph workflow orchestration\n\u2502 \u2502 \u251c\u2500\u2500 nodes.py # All 10 agent nodes\n\u2502 \u2502 \u2514\u2500\u2500 state.py # Centralized AgentState (TypedDict)\n\u2502 \u2514\u2500\u2500 utils/\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 cache.py # Query result caching\n\u2502 \u251c\u2500\u2500 formatter.py # ANSI terminal formatting\n\u2502 \u251c\u2500\u2500 rate_limiter.py # API rate limiting\n\u2502 \u251c\u2500\u2500 schema_fetcher.py # BigQuery schema utilities\n\u2502 \u2514\u2500\u2500 visualizer.py # Chart generation (matplotlib/seaborn)\n\u251c\u2500\u2500 tests/ # Test suite\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 test_nodes.py # Node unit tests\n\u2502 \u2514\u2500\u2500 test_graph.py # Graph integration tests\n\u251c\u2500\u2500 .env.example # Configuration template\n\u251c\u2500\u2500 requirements.txt # Dependencies\n\u251c\u2500\u2500 setup.py # PyPI packaging\n\u251c\u2500\u2500 pyproject.toml # Modern Python packaging\n\u251c\u2500\u2500 install.sh # One-line installer\n\u251c\u2500\u2500 ARCHITECTURE.md # Detailed architecture docs\n\u251c\u2500\u2500 GETTING_KEYS.md # API key setup guide\n\u2514\u2500\u2500 README.md # This file\n```\n\n---\n\n## \u2699\ufe0f Configuration\n\nAll configuration via `.env` file:\n\n```bash\n# REQUIRED\nGOOGLE_CLOUD_PROJECT=your-project-id\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\nGEMINI_API_KEY=your-gemini-api-key\n\n# OPTIONAL\nGEMINI_MODEL=gemini-2.0-flash-exp # Choose Gemini model\nORION_OUTPUT_DIR=~/orion_results # Results directory\nBIGQUERY_DATASET=bigquery-public-data.thelook_ecommerce\nMAX_QUERY_ROWS=10000 # Row limit\nQUERY_TIMEOUT=300 # Timeout (seconds)\n```\n\n---\n\n## \ud83d\udcca Dataset\n\nUses Google BigQuery's public e-commerce dataset:\n- **Dataset**: `bigquery-public-data.thelook_ecommerce`\n- **Tables**: `orders`, `order_items`, `products`, `users`\n- **Schema**: Automatically loaded with column descriptions\n\n---\n\n## \ud83d\udd27 Development\n\n### Run from Source\n```bash\ngit clone https://github.com/gavrielhan/orion-data-analyst.git\ncd orion-data-analyst\npip install -e .\norion\n```\n---\n\n## \ud83d\udcdd Commands\n\nIn the Orion CLI:\n- `exit` / `quit` / `q` - Exit Orion\n- `save session` - Save conversation history\n- `load session [path]` - Load previous session\n- `clear cache` - Clear query cache\n\n---\n\n## \ud83d\udee0\ufe0f Technology Stack\n\n| Component | Technology |\n|-----------|-----------|\n| **AI Orchestration** | LangGraph |\n| **LLM Integration** | LangChain |\n| **AI Model** | Google Gemini 2.0 Flash |\n| **Data Warehouse** | Google BigQuery |\n| **Data Processing** | pandas |\n| **Visualization** | matplotlib, seaborn |\n| **State Management** | TypedDict (Python) |\n| **Configuration** | python-dotenv |\n| **Packaging** | setuptools, PyPI |\n\n---\n\n## \ud83d\udcdc License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions welcome! Please:\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Submit a pull request\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\n- Built with [LangGraph](https://github.com/langchain-ai/langgraph) by LangChain\n- Powered by [Google Gemini](https://ai.google.dev/)\n- Data from [BigQuery Public Datasets](https://cloud.google.com/bigquery/public-data)\n\n---\n\n**Made with \u2764\ufe0f by Gavriel Hannuna**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "AI-powered BigQuery data analysis agent with natural language interface",
"version": "1.1.2",
"project_urls": {
"Bug Reports": "https://github.com/gavrielhan/orion-data-analyst/issues",
"Documentation": "https://github.com/gavrielhan/orion-data-analyst#readme",
"Homepage": "https://github.com/gavrielhan/orion-data-analyst",
"Source": "https://github.com/gavrielhan/orion-data-analyst"
},
"split_keywords": [
"data-analysis",
" bigquery",
" ai",
" nlp",
" sql",
" gemini",
" analytics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "483a2304d84192811690d5d277fc26d57c2db6d39ebf6faf143d5761cc0a7964",
"md5": "b5b2bfb07b0d3f47e1fe88cad0d72854",
"sha256": "9c9533d3ff39f72ecaf6093c0e45ff1d52b26e3342b75dbc5998ccbd02fb1903"
},
"downloads": -1,
"filename": "orion_data_analyst-1.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b5b2bfb07b0d3f47e1fe88cad0d72854",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 49907,
"upload_time": "2025-11-06T12:32:42",
"upload_time_iso_8601": "2025-11-06T12:32:42.543393Z",
"url": "https://files.pythonhosted.org/packages/48/3a/2304d84192811690d5d277fc26d57c2db6d39ebf6faf143d5761cc0a7964/orion_data_analyst-1.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ea875c55d8f2acfb84f57c72be7f70c37c1457b0da41c66f29bd504a28ccfd3f",
"md5": "339917d824a7a3a222d1f49c394b4eb5",
"sha256": "3cc773d4ae1a98e483d68e8544a1cf1f0dec80e95937a36d60baf67d78e09902"
},
"downloads": -1,
"filename": "orion_data_analyst-1.1.2.tar.gz",
"has_sig": false,
"md5_digest": "339917d824a7a3a222d1f49c394b4eb5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 453341,
"upload_time": "2025-11-06T12:32:44",
"upload_time_iso_8601": "2025-11-06T12:32:44.402895Z",
"url": "https://files.pythonhosted.org/packages/ea/87/5c55d8f2acfb84f57c72be7f70c37c1457b0da41c66f29bd504a28ccfd3f/orion_data_analyst-1.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-06 12:32:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gavrielhan",
"github_project": "orion-data-analyst",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "langgraph",
"specs": [
[
"==",
"0.2.45"
]
]
},
{
"name": "langchain",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "google-cloud-bigquery",
"specs": [
[
"==",
"3.25.0"
]
]
},
{
"name": "google-generativeai",
"specs": []
},
{
"name": "db-dtypes",
"specs": []
},
{
"name": "pandas",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.9.2"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.12.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.7.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.12.0"
]
]
}
],
"lcname": "orion-data-analyst"
}