# DataAgent
A comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents.
## Overview
DataAgent provides a unified interface for machine learning and statistical analysis, making it easy to integrate with LangGraph agents for automated data analysis workflows. The package includes:
- **Universal Scikit-learn Tools**: Comprehensive machine learning estimators with automated parameter validation and model selection
- **Universal Statsmodels Tools**: Statistical analysis tools including linear models, GLM, nonparametric methods, robust linear models, and ANOVA
## Installation
```bash
pip install datagent
```
For development dependencies:
```bash
pip install datagent[dev]
```
For LangGraph integration:
```bash
pip install datagent[langgraph]
```
## Quick Start
### Using Scikit-learn Tools
```python
import datagent
import pandas as pd
from sklearn.datasets import load_iris
# Load sample data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)
# Use universal sklearn estimator
result = datagent.universal_sklearn_estimator(
model_name="random_forest_classifier",
X=X,
y=y,
test_size=0.2,
random_state=42
)
print(f"Accuracy: {result['metrics']['accuracy']:.4f}")
```
### Using Statsmodels Tools
```python
import datagent
import pandas as pd
import numpy as np
# Create sample data
np.random.seed(42)
n = 100
X = np.random.randn(n, 2)
y = 2 * X[:, 0] + 1.5 * X[:, 1] + np.random.randn(n)
df = pd.DataFrame({
'y': y,
'x1': X[:, 0],
'x2': X[:, 1]
})
# Use universal linear model
result = datagent.universal_linear_models(
model_name="ols",
data=df,
formula="y ~ x1 + x2"
)
print(f"R-squared: {result['model_info']['r_squared']:.4f}")
```
## Features
### Scikit-learn Tools
- **Classification**: Logistic Regression, Random Forest, SVM, Neural Networks, and more
- **Regression**: Linear Regression, Ridge, Lasso, Elastic Net, and more
- **Clustering**: K-Means, DBSCAN, Hierarchical Clustering, and more
- **Preprocessing**: StandardScaler, LabelEncoder, and more
- **Model Selection**: Cross-validation, hyperparameter tuning
- **Metrics**: Comprehensive evaluation metrics for each task type
### Statsmodels Tools
- **Linear Models**: OLS, WLS, GLS, and more
- **Generalized Linear Models (GLM)**: Logistic, Poisson, Gamma, and more
- **Nonparametric Methods**: Kernel density estimation, smoothing
- **Robust Linear Models**: RLM with various M-estimators
- **ANOVA**: Analysis of variance for experimental designs
## LangGraph Integration
DataAgent is designed to work seamlessly with LangGraph agents. Here's an example:
```python
from langgraph.graph import StateGraph
import datagent
# Create a LangGraph tool
sklearn_tool = datagent.create_sklearn_langgraph_tool()
# Use in your agent workflow
def analyze_data(state):
# Your data analysis logic here
result = sklearn_tool.invoke({
"model_name": "random_forest_classifier",
"X": state["data"],
"y": state["target"]
})
return {"analysis_result": result}
# Build your graph
workflow = StateGraph()
workflow.add_node("analyze", analyze_data)
```
## API Reference
### Scikit-learn Functions
- `universal_sklearn_estimator()`: Main function for sklearn model training
- `extract_sklearn_model_info()`: Extract model information
- `get_sklearn_tool_description()`: Get tool description for LangGraph
- `create_sklearn_langgraph_tool()`: Create LangGraph tool
- `get_available_sklearn_models()`: List available models
- `validate_sklearn_parameters()`: Validate model parameters
### Statsmodels Functions
- `universal_linear_models()`: Linear model analysis
- `universal_glm()`: Generalized linear model analysis
- `universal_nonparametric()`: Nonparametric analysis
- `universal_rlm()`: Robust linear model analysis
- `universal_anova()`: ANOVA analysis
Each function has corresponding helper functions for model info extraction, tool creation, and parameter validation.
## Examples
See the `examples/` directory for comprehensive examples:
- **`basic_usage.py`** - Basic usage demonstration (Python script)
- **`basic_usage.ipynb`** - Interactive Jupyter notebook with basic usage
- **`langgraph_integration.py`** - LangGraph integration example (Python script)
- **`langgraph_integration.ipynb`** - Interactive Jupyter notebook with LangGraph integration
### Running Examples
**Python Scripts:**
```bash
python examples/basic_usage.py
python examples/langgraph_integration.py
```
**Jupyter Notebooks:**
```bash
jupyter notebook examples/basic_usage.ipynb
jupyter notebook examples/langgraph_integration.ipynb
```
## Contributing
We welcome contributions! Please see our contributing guidelines for details.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Support
- Documentation: [https://datagent.readthedocs.io/](https://datagent.readthedocs.io/)
- Issues: [https://github.com/yourusername/datagent/issues](https://github.com/yourusername/datagent/issues)
- Discussions: [https://github.com/yourusername/datagent/discussions](https://github.com/yourusername/datagent/discussions)
## Citation
If you use DataAgent in your research, please cite:
```bibtex
@software{datagent2024,
title={DataAgent: A comprehensive data analysis toolkit for LangGraph agents},
author={DataAgent Team},
year={2024},
url={https://github.com/yourusername/datagent}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "datagent",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Haris Jabbar <haris@superpandas.ai>",
"keywords": "machine-learning, scikit-learn, statsmodels, data-analysis, langgraph, langchain",
"author": null,
"author_email": "Haris Jabbar <haris@superpandas.ai>",
"download_url": "https://files.pythonhosted.org/packages/cc/0a/268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403/datagent-0.0.1.tar.gz",
"platform": null,
"description": "# DataAgent\n\nA comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents.\n\n## Overview\n\nDataAgent provides a unified interface for machine learning and statistical analysis, making it easy to integrate with LangGraph agents for automated data analysis workflows. The package includes:\n\n- **Universal Scikit-learn Tools**: Comprehensive machine learning estimators with automated parameter validation and model selection\n- **Universal Statsmodels Tools**: Statistical analysis tools including linear models, GLM, nonparametric methods, robust linear models, and ANOVA\n\n## Installation\n\n```bash\npip install datagent\n```\n\nFor development dependencies:\n```bash\npip install datagent[dev]\n```\n\nFor LangGraph integration:\n```bash\npip install datagent[langgraph]\n```\n\n## Quick Start\n\n### Using Scikit-learn Tools\n\n```python\nimport datagent\nimport pandas as pd\nfrom sklearn.datasets import load_iris\n\n# Load sample data\niris = load_iris()\nX = pd.DataFrame(iris.data, columns=iris.feature_names)\ny = pd.Series(iris.target)\n\n# Use universal sklearn estimator\nresult = datagent.universal_sklearn_estimator(\n model_name=\"random_forest_classifier\",\n X=X,\n y=y,\n test_size=0.2,\n random_state=42\n)\n\nprint(f\"Accuracy: {result['metrics']['accuracy']:.4f}\")\n```\n\n### Using Statsmodels Tools\n\n```python\nimport datagent\nimport pandas as pd\nimport numpy as np\n\n# Create sample data\nnp.random.seed(42)\nn = 100\nX = np.random.randn(n, 2)\ny = 2 * X[:, 0] + 1.5 * X[:, 1] + np.random.randn(n)\n\ndf = pd.DataFrame({\n 'y': y,\n 'x1': X[:, 0],\n 'x2': X[:, 1]\n})\n\n# Use universal linear model\nresult = datagent.universal_linear_models(\n model_name=\"ols\",\n data=df,\n formula=\"y ~ x1 + x2\"\n)\n\nprint(f\"R-squared: {result['model_info']['r_squared']:.4f}\")\n```\n\n## Features\n\n### Scikit-learn Tools\n\n- **Classification**: Logistic Regression, Random Forest, SVM, Neural Networks, and more\n- **Regression**: Linear Regression, Ridge, Lasso, Elastic Net, and more\n- **Clustering**: K-Means, DBSCAN, Hierarchical Clustering, and more\n- **Preprocessing**: StandardScaler, LabelEncoder, and more\n- **Model Selection**: Cross-validation, hyperparameter tuning\n- **Metrics**: Comprehensive evaluation metrics for each task type\n\n### Statsmodels Tools\n\n- **Linear Models**: OLS, WLS, GLS, and more\n- **Generalized Linear Models (GLM)**: Logistic, Poisson, Gamma, and more\n- **Nonparametric Methods**: Kernel density estimation, smoothing\n- **Robust Linear Models**: RLM with various M-estimators\n- **ANOVA**: Analysis of variance for experimental designs\n\n## LangGraph Integration\n\nDataAgent is designed to work seamlessly with LangGraph agents. Here's an example:\n\n```python\nfrom langgraph.graph import StateGraph\nimport datagent\n\n# Create a LangGraph tool\nsklearn_tool = datagent.create_sklearn_langgraph_tool()\n\n# Use in your agent workflow\ndef analyze_data(state):\n # Your data analysis logic here\n result = sklearn_tool.invoke({\n \"model_name\": \"random_forest_classifier\",\n \"X\": state[\"data\"],\n \"y\": state[\"target\"]\n })\n return {\"analysis_result\": result}\n\n# Build your graph\nworkflow = StateGraph()\nworkflow.add_node(\"analyze\", analyze_data)\n```\n\n## API Reference\n\n### Scikit-learn Functions\n\n- `universal_sklearn_estimator()`: Main function for sklearn model training\n- `extract_sklearn_model_info()`: Extract model information\n- `get_sklearn_tool_description()`: Get tool description for LangGraph\n- `create_sklearn_langgraph_tool()`: Create LangGraph tool\n- `get_available_sklearn_models()`: List available models\n- `validate_sklearn_parameters()`: Validate model parameters\n\n### Statsmodels Functions\n\n- `universal_linear_models()`: Linear model analysis\n- `universal_glm()`: Generalized linear model analysis\n- `universal_nonparametric()`: Nonparametric analysis\n- `universal_rlm()`: Robust linear model analysis\n- `universal_anova()`: ANOVA analysis\n\nEach function has corresponding helper functions for model info extraction, tool creation, and parameter validation.\n\n## Examples\n\nSee the `examples/` directory for comprehensive examples:\n\n- **`basic_usage.py`** - Basic usage demonstration (Python script)\n- **`basic_usage.ipynb`** - Interactive Jupyter notebook with basic usage\n- **`langgraph_integration.py`** - LangGraph integration example (Python script)\n- **`langgraph_integration.ipynb`** - Interactive Jupyter notebook with LangGraph integration\n\n### Running Examples\n\n**Python Scripts:**\n```bash\npython examples/basic_usage.py\npython examples/langgraph_integration.py\n```\n\n**Jupyter Notebooks:**\n```bash\njupyter notebook examples/basic_usage.ipynb\njupyter notebook examples/langgraph_integration.ipynb\n```\n\n## Contributing\n\nWe welcome contributions! Please see our contributing guidelines for details.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Support\n\n- Documentation: [https://datagent.readthedocs.io/](https://datagent.readthedocs.io/)\n- Issues: [https://github.com/yourusername/datagent/issues](https://github.com/yourusername/datagent/issues)\n- Discussions: [https://github.com/yourusername/datagent/discussions](https://github.com/yourusername/datagent/discussions)\n\n## Citation\n\nIf you use DataAgent in your research, please cite:\n\n```bibtex\n@software{datagent2024,\n title={DataAgent: A comprehensive data analysis toolkit for LangGraph agents},\n author={DataAgent Team},\n year={2024},\n url={https://github.com/yourusername/datagent}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents",
"version": "0.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/superpandas-ai/datagent/issues",
"Documentation": "https://datagent.readthedocs.io/",
"Homepage": "https://github.com/superpandas-ai/datagent",
"Repository": "https://github.com/superpandas-ai/datagent"
},
"split_keywords": [
"machine-learning",
" scikit-learn",
" statsmodels",
" data-analysis",
" langgraph",
" langchain"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b3b009f6a2d09fb9724345c58380578f4b1f027fc684d7a0a18373cb5370c3ba",
"md5": "3835abb373346bbcd597e0b561a5b2e6",
"sha256": "8108cf197397fa0122d7b9c4cb5303b23e733c700f71cc51eac98a03bec15f2b"
},
"downloads": -1,
"filename": "datagent-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3835abb373346bbcd597e0b561a5b2e6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 7130,
"upload_time": "2025-08-24T17:44:19",
"upload_time_iso_8601": "2025-08-24T17:44:19.044757Z",
"url": "https://files.pythonhosted.org/packages/b3/b0/09f6a2d09fb9724345c58380578f4b1f027fc684d7a0a18373cb5370c3ba/datagent-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cc0a268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403",
"md5": "15554b8b77e37b9e46c90a8b55bff4c8",
"sha256": "7cceefca12bb01039f6fb72484af3ae0bda92eceda40a8663971fd5e163c116a"
},
"downloads": -1,
"filename": "datagent-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "15554b8b77e37b9e46c90a8b55bff4c8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 150830,
"upload_time": "2025-08-24T17:44:20",
"upload_time_iso_8601": "2025-08-24T17:44:20.514781Z",
"url": "https://files.pythonhosted.org/packages/cc/0a/268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403/datagent-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-24 17:44:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "superpandas-ai",
"github_project": "datagent",
"github_not_found": true,
"lcname": "datagent"
}