datagent


Namedatagent JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryA comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents
upload_time2025-08-24 17:44:20
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords machine-learning scikit-learn statsmodels data-analysis langgraph langchain
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DataAgent

A comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents.

## Overview

DataAgent provides a unified interface for machine learning and statistical analysis, making it easy to integrate with LangGraph agents for automated data analysis workflows. The package includes:

- **Universal Scikit-learn Tools**: Comprehensive machine learning estimators with automated parameter validation and model selection
- **Universal Statsmodels Tools**: Statistical analysis tools including linear models, GLM, nonparametric methods, robust linear models, and ANOVA

## Installation

```bash
pip install datagent
```

For development dependencies:
```bash
pip install datagent[dev]
```

For LangGraph integration:
```bash
pip install datagent[langgraph]
```

## Quick Start

### Using Scikit-learn Tools

```python
import datagent
import pandas as pd
from sklearn.datasets import load_iris

# Load sample data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

# Use universal sklearn estimator
result = datagent.universal_sklearn_estimator(
    model_name="random_forest_classifier",
    X=X,
    y=y,
    test_size=0.2,
    random_state=42
)

print(f"Accuracy: {result['metrics']['accuracy']:.4f}")
```

### Using Statsmodels Tools

```python
import datagent
import pandas as pd
import numpy as np

# Create sample data
np.random.seed(42)
n = 100
X = np.random.randn(n, 2)
y = 2 * X[:, 0] + 1.5 * X[:, 1] + np.random.randn(n)

df = pd.DataFrame({
    'y': y,
    'x1': X[:, 0],
    'x2': X[:, 1]
})

# Use universal linear model
result = datagent.universal_linear_models(
    model_name="ols",
    data=df,
    formula="y ~ x1 + x2"
)

print(f"R-squared: {result['model_info']['r_squared']:.4f}")
```

## Features

### Scikit-learn Tools

- **Classification**: Logistic Regression, Random Forest, SVM, Neural Networks, and more
- **Regression**: Linear Regression, Ridge, Lasso, Elastic Net, and more
- **Clustering**: K-Means, DBSCAN, Hierarchical Clustering, and more
- **Preprocessing**: StandardScaler, LabelEncoder, and more
- **Model Selection**: Cross-validation, hyperparameter tuning
- **Metrics**: Comprehensive evaluation metrics for each task type

### Statsmodels Tools

- **Linear Models**: OLS, WLS, GLS, and more
- **Generalized Linear Models (GLM)**: Logistic, Poisson, Gamma, and more
- **Nonparametric Methods**: Kernel density estimation, smoothing
- **Robust Linear Models**: RLM with various M-estimators
- **ANOVA**: Analysis of variance for experimental designs

## LangGraph Integration

DataAgent is designed to work seamlessly with LangGraph agents. Here's an example:

```python
from langgraph.graph import StateGraph
import datagent

# Create a LangGraph tool
sklearn_tool = datagent.create_sklearn_langgraph_tool()

# Use in your agent workflow
def analyze_data(state):
    # Your data analysis logic here
    result = sklearn_tool.invoke({
        "model_name": "random_forest_classifier",
        "X": state["data"],
        "y": state["target"]
    })
    return {"analysis_result": result}

# Build your graph
workflow = StateGraph()
workflow.add_node("analyze", analyze_data)
```

## API Reference

### Scikit-learn Functions

- `universal_sklearn_estimator()`: Main function for sklearn model training
- `extract_sklearn_model_info()`: Extract model information
- `get_sklearn_tool_description()`: Get tool description for LangGraph
- `create_sklearn_langgraph_tool()`: Create LangGraph tool
- `get_available_sklearn_models()`: List available models
- `validate_sklearn_parameters()`: Validate model parameters

### Statsmodels Functions

- `universal_linear_models()`: Linear model analysis
- `universal_glm()`: Generalized linear model analysis
- `universal_nonparametric()`: Nonparametric analysis
- `universal_rlm()`: Robust linear model analysis
- `universal_anova()`: ANOVA analysis

Each function has corresponding helper functions for model info extraction, tool creation, and parameter validation.

## Examples

See the `examples/` directory for comprehensive examples:

- **`basic_usage.py`** - Basic usage demonstration (Python script)
- **`basic_usage.ipynb`** - Interactive Jupyter notebook with basic usage
- **`langgraph_integration.py`** - LangGraph integration example (Python script)
- **`langgraph_integration.ipynb`** - Interactive Jupyter notebook with LangGraph integration

### Running Examples

**Python Scripts:**
```bash
python examples/basic_usage.py
python examples/langgraph_integration.py
```

**Jupyter Notebooks:**
```bash
jupyter notebook examples/basic_usage.ipynb
jupyter notebook examples/langgraph_integration.ipynb
```

## Contributing

We welcome contributions! Please see our contributing guidelines for details.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Support

- Documentation: [https://datagent.readthedocs.io/](https://datagent.readthedocs.io/)
- Issues: [https://github.com/yourusername/datagent/issues](https://github.com/yourusername/datagent/issues)
- Discussions: [https://github.com/yourusername/datagent/discussions](https://github.com/yourusername/datagent/discussions)

## Citation

If you use DataAgent in your research, please cite:

```bibtex
@software{datagent2024,
  title={DataAgent: A comprehensive data analysis toolkit for LangGraph agents},
  author={DataAgent Team},
  year={2024},
  url={https://github.com/yourusername/datagent}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datagent",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Haris Jabbar <haris@superpandas.ai>",
    "keywords": "machine-learning, scikit-learn, statsmodels, data-analysis, langgraph, langchain",
    "author": null,
    "author_email": "Haris Jabbar <haris@superpandas.ai>",
    "download_url": "https://files.pythonhosted.org/packages/cc/0a/268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403/datagent-0.0.1.tar.gz",
    "platform": null,
    "description": "# DataAgent\n\nA comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents.\n\n## Overview\n\nDataAgent provides a unified interface for machine learning and statistical analysis, making it easy to integrate with LangGraph agents for automated data analysis workflows. The package includes:\n\n- **Universal Scikit-learn Tools**: Comprehensive machine learning estimators with automated parameter validation and model selection\n- **Universal Statsmodels Tools**: Statistical analysis tools including linear models, GLM, nonparametric methods, robust linear models, and ANOVA\n\n## Installation\n\n```bash\npip install datagent\n```\n\nFor development dependencies:\n```bash\npip install datagent[dev]\n```\n\nFor LangGraph integration:\n```bash\npip install datagent[langgraph]\n```\n\n## Quick Start\n\n### Using Scikit-learn Tools\n\n```python\nimport datagent\nimport pandas as pd\nfrom sklearn.datasets import load_iris\n\n# Load sample data\niris = load_iris()\nX = pd.DataFrame(iris.data, columns=iris.feature_names)\ny = pd.Series(iris.target)\n\n# Use universal sklearn estimator\nresult = datagent.universal_sklearn_estimator(\n    model_name=\"random_forest_classifier\",\n    X=X,\n    y=y,\n    test_size=0.2,\n    random_state=42\n)\n\nprint(f\"Accuracy: {result['metrics']['accuracy']:.4f}\")\n```\n\n### Using Statsmodels Tools\n\n```python\nimport datagent\nimport pandas as pd\nimport numpy as np\n\n# Create sample data\nnp.random.seed(42)\nn = 100\nX = np.random.randn(n, 2)\ny = 2 * X[:, 0] + 1.5 * X[:, 1] + np.random.randn(n)\n\ndf = pd.DataFrame({\n    'y': y,\n    'x1': X[:, 0],\n    'x2': X[:, 1]\n})\n\n# Use universal linear model\nresult = datagent.universal_linear_models(\n    model_name=\"ols\",\n    data=df,\n    formula=\"y ~ x1 + x2\"\n)\n\nprint(f\"R-squared: {result['model_info']['r_squared']:.4f}\")\n```\n\n## Features\n\n### Scikit-learn Tools\n\n- **Classification**: Logistic Regression, Random Forest, SVM, Neural Networks, and more\n- **Regression**: Linear Regression, Ridge, Lasso, Elastic Net, and more\n- **Clustering**: K-Means, DBSCAN, Hierarchical Clustering, and more\n- **Preprocessing**: StandardScaler, LabelEncoder, and more\n- **Model Selection**: Cross-validation, hyperparameter tuning\n- **Metrics**: Comprehensive evaluation metrics for each task type\n\n### Statsmodels Tools\n\n- **Linear Models**: OLS, WLS, GLS, and more\n- **Generalized Linear Models (GLM)**: Logistic, Poisson, Gamma, and more\n- **Nonparametric Methods**: Kernel density estimation, smoothing\n- **Robust Linear Models**: RLM with various M-estimators\n- **ANOVA**: Analysis of variance for experimental designs\n\n## LangGraph Integration\n\nDataAgent is designed to work seamlessly with LangGraph agents. Here's an example:\n\n```python\nfrom langgraph.graph import StateGraph\nimport datagent\n\n# Create a LangGraph tool\nsklearn_tool = datagent.create_sklearn_langgraph_tool()\n\n# Use in your agent workflow\ndef analyze_data(state):\n    # Your data analysis logic here\n    result = sklearn_tool.invoke({\n        \"model_name\": \"random_forest_classifier\",\n        \"X\": state[\"data\"],\n        \"y\": state[\"target\"]\n    })\n    return {\"analysis_result\": result}\n\n# Build your graph\nworkflow = StateGraph()\nworkflow.add_node(\"analyze\", analyze_data)\n```\n\n## API Reference\n\n### Scikit-learn Functions\n\n- `universal_sklearn_estimator()`: Main function for sklearn model training\n- `extract_sklearn_model_info()`: Extract model information\n- `get_sklearn_tool_description()`: Get tool description for LangGraph\n- `create_sklearn_langgraph_tool()`: Create LangGraph tool\n- `get_available_sklearn_models()`: List available models\n- `validate_sklearn_parameters()`: Validate model parameters\n\n### Statsmodels Functions\n\n- `universal_linear_models()`: Linear model analysis\n- `universal_glm()`: Generalized linear model analysis\n- `universal_nonparametric()`: Nonparametric analysis\n- `universal_rlm()`: Robust linear model analysis\n- `universal_anova()`: ANOVA analysis\n\nEach function has corresponding helper functions for model info extraction, tool creation, and parameter validation.\n\n## Examples\n\nSee the `examples/` directory for comprehensive examples:\n\n- **`basic_usage.py`** - Basic usage demonstration (Python script)\n- **`basic_usage.ipynb`** - Interactive Jupyter notebook with basic usage\n- **`langgraph_integration.py`** - LangGraph integration example (Python script)\n- **`langgraph_integration.ipynb`** - Interactive Jupyter notebook with LangGraph integration\n\n### Running Examples\n\n**Python Scripts:**\n```bash\npython examples/basic_usage.py\npython examples/langgraph_integration.py\n```\n\n**Jupyter Notebooks:**\n```bash\njupyter notebook examples/basic_usage.ipynb\njupyter notebook examples/langgraph_integration.ipynb\n```\n\n## Contributing\n\nWe welcome contributions! Please see our contributing guidelines for details.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Support\n\n- Documentation: [https://datagent.readthedocs.io/](https://datagent.readthedocs.io/)\n- Issues: [https://github.com/yourusername/datagent/issues](https://github.com/yourusername/datagent/issues)\n- Discussions: [https://github.com/yourusername/datagent/discussions](https://github.com/yourusername/datagent/discussions)\n\n## Citation\n\nIf you use DataAgent in your research, please cite:\n\n```bibtex\n@software{datagent2024,\n  title={DataAgent: A comprehensive data analysis toolkit for LangGraph agents},\n  author={DataAgent Team},\n  year={2024},\n  url={https://github.com/yourusername/datagent}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive data analysis toolkit with universal sklearn and statsmodels tools for LangGraph agents",
    "version": "0.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/superpandas-ai/datagent/issues",
        "Documentation": "https://datagent.readthedocs.io/",
        "Homepage": "https://github.com/superpandas-ai/datagent",
        "Repository": "https://github.com/superpandas-ai/datagent"
    },
    "split_keywords": [
        "machine-learning",
        " scikit-learn",
        " statsmodels",
        " data-analysis",
        " langgraph",
        " langchain"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b3b009f6a2d09fb9724345c58380578f4b1f027fc684d7a0a18373cb5370c3ba",
                "md5": "3835abb373346bbcd597e0b561a5b2e6",
                "sha256": "8108cf197397fa0122d7b9c4cb5303b23e733c700f71cc51eac98a03bec15f2b"
            },
            "downloads": -1,
            "filename": "datagent-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3835abb373346bbcd597e0b561a5b2e6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7130,
            "upload_time": "2025-08-24T17:44:19",
            "upload_time_iso_8601": "2025-08-24T17:44:19.044757Z",
            "url": "https://files.pythonhosted.org/packages/b3/b0/09f6a2d09fb9724345c58380578f4b1f027fc684d7a0a18373cb5370c3ba/datagent-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cc0a268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403",
                "md5": "15554b8b77e37b9e46c90a8b55bff4c8",
                "sha256": "7cceefca12bb01039f6fb72484af3ae0bda92eceda40a8663971fd5e163c116a"
            },
            "downloads": -1,
            "filename": "datagent-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "15554b8b77e37b9e46c90a8b55bff4c8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 150830,
            "upload_time": "2025-08-24T17:44:20",
            "upload_time_iso_8601": "2025-08-24T17:44:20.514781Z",
            "url": "https://files.pythonhosted.org/packages/cc/0a/268e5ed4d1b50e78248e6e7c9d226d50882059741a7a7f2890331b4de403/datagent-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-24 17:44:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "superpandas-ai",
    "github_project": "datagent",
    "github_not_found": true,
    "lcname": "datagent"
}
        
Elapsed time: 1.08311s