polars-grouper


Namepolars-grouper JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryHigh-performance graph analysis and pattern mining extension for Polars
upload_time2024-10-24 20:38:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords polars graph network clustering data-science
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PolarsGrouper

PolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.

## Core Features

### Component Grouping
- `super_merger`: Easy-to-use wrapper for grouping connected components
- `super_merger_weighted`: Component grouping with weight thresholds
- Efficient implementation using Rust and Polars
- Works with both eager and lazy Polars DataFrames

### Additional Graph Analytics
- **Shortest Path Analysis**: Find shortest paths between nodes
- **PageRank**: Calculate node importance scores
- **Betweenness Centrality**: Identify key bridge nodes
- **Association Rules**: Discover item relationships and patterns

## Installation

```sh
pip install polars-grouper

# For development:
python -m venv .venv
source .venv/bin/activate
maturin develop
```

## Usage Examples

### Basic Component Grouping
The core functionality uses `super_merger` to identify connected components:

```python
import polars as pl
from polars_grouper import super_merger

df = pl.DataFrame({
    "from": ["A", "B", "C", "D", "E", "F"],
    "to": ["B", "C", "A", "E", "F", "D"],
    "value": [1, 2, 3, 4, 5, 6]
})

result = super_merger(df, "from", "to")
print(result)
```

### Weighted Component Grouping
For cases where edge weights matter:

```python
from polars_grouper import super_merger_weighted

df = pl.DataFrame({
    "from": ["A", "B", "C", "D", "E"],
    "to": ["B", "C", "D", "E", "A"],
    "weight": [0.9, 0.2, 0.05, 0.8, 0.3]
})

result = super_merger_weighted(
    df, 
    "from", 
    "to", 
    "weight",
    weight_threshold=0.3
)
print(result)
```

### Additional Graph Analytics

#### Shortest Path Analysis
Find shortest paths between nodes:

```python
from polars_grouper import calculate_shortest_path

df = pl.DataFrame({
    "from": ["A", "A", "B", "C"],
    "to": ["B", "C", "C", "D"],
    "weight": [1.0, 2.0, 1.0, 1.5]
})

paths = df.select(
    calculate_shortest_path(
        pl.col("from"),
        pl.col("to"),
        pl.col("weight"),
        directed=False
    ).alias("paths")
).unnest("paths")
```

#### PageRank Calculation
Calculate node importance:

```python
from polars_grouper import page_rank

df = pl.DataFrame({
    "from": ["A", "A", "B", "C", "D"],
    "to": ["B", "C", "C", "A", "B"]
})

rankings = df.select(
    page_rank(
        pl.col("from"),
        pl.col("to"),
        damping_factor=0.85
    ).alias("pagerank")
).unnest("pagerank")
```

#### Association Rule Mining
Discover item relationships:

```python
from polars_grouper import graph_association_rules

transactions = pl.DataFrame({
    "transaction_id": [1, 1, 1, 2, 2, 3],
    "item_id": ["A", "B", "C", "B", "D", "A"],
    "frequency": [1, 2, 1, 1, 1, 1]
})

rules = transactions.select(
    graph_association_rules(
        pl.col("transaction_id"),
        pl.col("item_id"),
        pl.col("frequency"),
        min_support=0.1
    ).alias("rules")
).unnest("rules")
```

#### Betweenness Centrality
Identify bridge nodes:

```python
from polars_grouper import betweenness_centrality

df = pl.DataFrame({
    "from": ["A", "A", "B", "C", "D", "E"],
    "to": ["B", "C", "C", "D", "E", "A"]
})

centrality = df.select(
    betweenness_centrality(
        pl.col("from"),
        pl.col("to"),
        normalized=True
    ).alias("centrality")
).unnest("centrality")
```

## Performance

The library is implemented in Rust for high performance:
- Efficient memory usage
- Fast computation for large graphs
- Seamless integration with Polars' lazy evaluation

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "polars-grouper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "polars, graph, network, clustering, data-science",
    "author": null,
    "author_email": "Edward Vaneechoud <evaneechoud@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/98/b5/a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03/polars_grouper-0.3.0.tar.gz",
    "platform": null,
    "description": "# PolarsGrouper\n\nPolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.\n\n## Core Features\n\n### Component Grouping\n- `super_merger`: Easy-to-use wrapper for grouping connected components\n- `super_merger_weighted`: Component grouping with weight thresholds\n- Efficient implementation using Rust and Polars\n- Works with both eager and lazy Polars DataFrames\n\n### Additional Graph Analytics\n- **Shortest Path Analysis**: Find shortest paths between nodes\n- **PageRank**: Calculate node importance scores\n- **Betweenness Centrality**: Identify key bridge nodes\n- **Association Rules**: Discover item relationships and patterns\n\n## Installation\n\n```sh\npip install polars-grouper\n\n# For development:\npython -m venv .venv\nsource .venv/bin/activate\nmaturin develop\n```\n\n## Usage Examples\n\n### Basic Component Grouping\nThe core functionality uses `super_merger` to identify connected components:\n\n```python\nimport polars as pl\nfrom polars_grouper import super_merger\n\ndf = pl.DataFrame({\n    \"from\": [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"],\n    \"to\": [\"B\", \"C\", \"A\", \"E\", \"F\", \"D\"],\n    \"value\": [1, 2, 3, 4, 5, 6]\n})\n\nresult = super_merger(df, \"from\", \"to\")\nprint(result)\n```\n\n### Weighted Component Grouping\nFor cases where edge weights matter:\n\n```python\nfrom polars_grouper import super_merger_weighted\n\ndf = pl.DataFrame({\n    \"from\": [\"A\", \"B\", \"C\", \"D\", \"E\"],\n    \"to\": [\"B\", \"C\", \"D\", \"E\", \"A\"],\n    \"weight\": [0.9, 0.2, 0.05, 0.8, 0.3]\n})\n\nresult = super_merger_weighted(\n    df, \n    \"from\", \n    \"to\", \n    \"weight\",\n    weight_threshold=0.3\n)\nprint(result)\n```\n\n### Additional Graph Analytics\n\n#### Shortest Path Analysis\nFind shortest paths between nodes:\n\n```python\nfrom polars_grouper import calculate_shortest_path\n\ndf = pl.DataFrame({\n    \"from\": [\"A\", \"A\", \"B\", \"C\"],\n    \"to\": [\"B\", \"C\", \"C\", \"D\"],\n    \"weight\": [1.0, 2.0, 1.0, 1.5]\n})\n\npaths = df.select(\n    calculate_shortest_path(\n        pl.col(\"from\"),\n        pl.col(\"to\"),\n        pl.col(\"weight\"),\n        directed=False\n    ).alias(\"paths\")\n).unnest(\"paths\")\n```\n\n#### PageRank Calculation\nCalculate node importance:\n\n```python\nfrom polars_grouper import page_rank\n\ndf = pl.DataFrame({\n    \"from\": [\"A\", \"A\", \"B\", \"C\", \"D\"],\n    \"to\": [\"B\", \"C\", \"C\", \"A\", \"B\"]\n})\n\nrankings = df.select(\n    page_rank(\n        pl.col(\"from\"),\n        pl.col(\"to\"),\n        damping_factor=0.85\n    ).alias(\"pagerank\")\n).unnest(\"pagerank\")\n```\n\n#### Association Rule Mining\nDiscover item relationships:\n\n```python\nfrom polars_grouper import graph_association_rules\n\ntransactions = pl.DataFrame({\n    \"transaction_id\": [1, 1, 1, 2, 2, 3],\n    \"item_id\": [\"A\", \"B\", \"C\", \"B\", \"D\", \"A\"],\n    \"frequency\": [1, 2, 1, 1, 1, 1]\n})\n\nrules = transactions.select(\n    graph_association_rules(\n        pl.col(\"transaction_id\"),\n        pl.col(\"item_id\"),\n        pl.col(\"frequency\"),\n        min_support=0.1\n    ).alias(\"rules\")\n).unnest(\"rules\")\n```\n\n#### Betweenness Centrality\nIdentify bridge nodes:\n\n```python\nfrom polars_grouper import betweenness_centrality\n\ndf = pl.DataFrame({\n    \"from\": [\"A\", \"A\", \"B\", \"C\", \"D\", \"E\"],\n    \"to\": [\"B\", \"C\", \"C\", \"D\", \"E\", \"A\"]\n})\n\ncentrality = df.select(\n    betweenness_centrality(\n        pl.col(\"from\"),\n        pl.col(\"to\"),\n        normalized=True\n    ).alias(\"centrality\")\n).unnest(\"centrality\")\n```\n\n## Performance\n\nThe library is implemented in Rust for high performance:\n- Efficient memory usage\n- Fast computation for large graphs\n- Seamless integration with Polars' lazy evaluation\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance graph analysis and pattern mining extension for Polars",
    "version": "0.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/yourusername/polars-grouper/issues",
        "Documentation": "https://github.com/yourusername/polars-grouper#readme",
        "Homepage": "https://github.com/yourusername/polars-grouper"
    },
    "split_keywords": [
        "polars",
        " graph",
        " network",
        " clustering",
        " data-science"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "668481fd9b5a35668684cfd84b9902283d25aa83857acbaf88237ccaeae24819",
                "md5": "7b9ea57acdff57566132dd16367eba8e",
                "sha256": "6a2c56eb4621502447268c2d40bfc7696fe291691fe777b257cdda869bfbdde2"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7b9ea57acdff57566132dd16367eba8e",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3648015,
            "upload_time": "2024-10-24T20:37:58",
            "upload_time_iso_8601": "2024-10-24T20:37:58.134623Z",
            "url": "https://files.pythonhosted.org/packages/66/84/81fd9b5a35668684cfd84b9902283d25aa83857acbaf88237ccaeae24819/polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bf741d5e452b71b7e08615adf3a15d799f593d5cf16382e2bded1d9c50566b8f",
                "md5": "30b7e09809e6df11dbbcd3e491485655",
                "sha256": "3701fea159f2104d78e8aaad65c2af698275a8b8aa036a8c1d98ef18de06a822"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "30b7e09809e6df11dbbcd3e491485655",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3361939,
            "upload_time": "2024-10-24T20:38:00",
            "upload_time_iso_8601": "2024-10-24T20:38:00.242486Z",
            "url": "https://files.pythonhosted.org/packages/bf/74/1d5e452b71b7e08615adf3a15d799f593d5cf16382e2bded1d9c50566b8f/polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6500dac0055ce0aec0eb7fc6ac7807690ecad84fff23b27e29b0ad48ebd22b14",
                "md5": "1b12186f0f62c4351877289d70ae45cc",
                "sha256": "29cb97892720b464a9109c31229a4df086665cfcad7390a90c0417fcbfb0b9fd"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "1b12186f0f62c4351877289d70ae45cc",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4406442,
            "upload_time": "2024-10-24T20:38:02",
            "upload_time_iso_8601": "2024-10-24T20:38:02.547454Z",
            "url": "https://files.pythonhosted.org/packages/65/00/dac0055ce0aec0eb7fc6ac7807690ecad84fff23b27e29b0ad48ebd22b14/polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "59a7a76c6f6ac2bdc8ed3f45c98886d00f6eb661f5ba6408268e0ac06f3cbae5",
                "md5": "a493241e0a154cb8c5b00c599c08a7be",
                "sha256": "447974f42782c9998a49d70e82caf124c39c69094a5d845d4d3778e084409ebc"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "a493241e0a154cb8c5b00c599c08a7be",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 4010386,
            "upload_time": "2024-10-24T20:38:04",
            "upload_time_iso_8601": "2024-10-24T20:38:04.494456Z",
            "url": "https://files.pythonhosted.org/packages/59/a7/a76c6f6ac2bdc8ed3f45c98886d00f6eb661f5ba6408268e0ac06f3cbae5/polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e2c66515349d6d39ea395a04d45c91cbba599cff993cf8991e5aed3c5f9b255a",
                "md5": "73ef96b7d906d01585ab1e0cf0ba3ff3",
                "sha256": "1fc5e028c0bb1c2e3e5d18d4357da3b06502cfcfd14061d23e15520dcf7caa5e"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0-cp38-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "73ef96b7d906d01585ab1e0cf0ba3ff3",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 3526180,
            "upload_time": "2024-10-24T20:38:06",
            "upload_time_iso_8601": "2024-10-24T20:38:06.576653Z",
            "url": "https://files.pythonhosted.org/packages/e2/c6/6515349d6d39ea395a04d45c91cbba599cff993cf8991e5aed3c5f9b255a/polars_grouper-0.3.0-cp38-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "98b5a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03",
                "md5": "f1a992fddbe2c2f83929a0dd6abccb04",
                "sha256": "76707a74ab55cca25b1c5066a293a29ae48baac3cc0db152983ed5230feeb622"
            },
            "downloads": -1,
            "filename": "polars_grouper-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f1a992fddbe2c2f83929a0dd6abccb04",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 30097,
            "upload_time": "2024-10-24T20:38:08",
            "upload_time_iso_8601": "2024-10-24T20:38:08.340598Z",
            "url": "https://files.pythonhosted.org/packages/98/b5/a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03/polars_grouper-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-24 20:38:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "polars-grouper",
    "github_not_found": true,
    "lcname": "polars-grouper"
}
        
Elapsed time: 1.13832s