# PolarsGrouper
PolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.
## Core Features
### Component Grouping
- `super_merger`: Easy-to-use wrapper for grouping connected components
- `super_merger_weighted`: Component grouping with weight thresholds
- Efficient implementation using Rust and Polars
- Works with both eager and lazy Polars DataFrames
### Additional Graph Analytics
- **Shortest Path Analysis**: Find shortest paths between nodes
- **PageRank**: Calculate node importance scores
- **Betweenness Centrality**: Identify key bridge nodes
- **Association Rules**: Discover item relationships and patterns
## Installation
```sh
pip install polars-grouper
# For development:
python -m venv .venv
source .venv/bin/activate
maturin develop
```
## Usage Examples
### Basic Component Grouping
The core functionality uses `super_merger` to identify connected components:
```python
import polars as pl
from polars_grouper import super_merger
df = pl.DataFrame({
"from": ["A", "B", "C", "D", "E", "F"],
"to": ["B", "C", "A", "E", "F", "D"],
"value": [1, 2, 3, 4, 5, 6]
})
result = super_merger(df, "from", "to")
print(result)
```
### Weighted Component Grouping
For cases where edge weights matter:
```python
from polars_grouper import super_merger_weighted
df = pl.DataFrame({
"from": ["A", "B", "C", "D", "E"],
"to": ["B", "C", "D", "E", "A"],
"weight": [0.9, 0.2, 0.05, 0.8, 0.3]
})
result = super_merger_weighted(
df,
"from",
"to",
"weight",
weight_threshold=0.3
)
print(result)
```
### Additional Graph Analytics
#### Shortest Path Analysis
Find shortest paths between nodes:
```python
from polars_grouper import calculate_shortest_path
df = pl.DataFrame({
"from": ["A", "A", "B", "C"],
"to": ["B", "C", "C", "D"],
"weight": [1.0, 2.0, 1.0, 1.5]
})
paths = df.select(
calculate_shortest_path(
pl.col("from"),
pl.col("to"),
pl.col("weight"),
directed=False
).alias("paths")
).unnest("paths")
```
#### PageRank Calculation
Calculate node importance:
```python
from polars_grouper import page_rank
df = pl.DataFrame({
"from": ["A", "A", "B", "C", "D"],
"to": ["B", "C", "C", "A", "B"]
})
rankings = df.select(
page_rank(
pl.col("from"),
pl.col("to"),
damping_factor=0.85
).alias("pagerank")
).unnest("pagerank")
```
#### Association Rule Mining
Discover item relationships:
```python
from polars_grouper import graph_association_rules
transactions = pl.DataFrame({
"transaction_id": [1, 1, 1, 2, 2, 3],
"item_id": ["A", "B", "C", "B", "D", "A"],
"frequency": [1, 2, 1, 1, 1, 1]
})
rules = transactions.select(
graph_association_rules(
pl.col("transaction_id"),
pl.col("item_id"),
pl.col("frequency"),
min_support=0.1
).alias("rules")
).unnest("rules")
```
#### Betweenness Centrality
Identify bridge nodes:
```python
from polars_grouper import betweenness_centrality
df = pl.DataFrame({
"from": ["A", "A", "B", "C", "D", "E"],
"to": ["B", "C", "C", "D", "E", "A"]
})
centrality = df.select(
betweenness_centrality(
pl.col("from"),
pl.col("to"),
normalized=True
).alias("centrality")
).unnest("centrality")
```
## Performance
The library is implemented in Rust for high performance:
- Efficient memory usage
- Fast computation for large graphs
- Seamless integration with Polars' lazy evaluation
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "polars-grouper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "polars, graph, network, clustering, data-science",
"author": null,
"author_email": "Edward Vaneechoud <evaneechoud@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/98/b5/a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03/polars_grouper-0.3.0.tar.gz",
"platform": null,
"description": "# PolarsGrouper\n\nPolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.\n\n## Core Features\n\n### Component Grouping\n- `super_merger`: Easy-to-use wrapper for grouping connected components\n- `super_merger_weighted`: Component grouping with weight thresholds\n- Efficient implementation using Rust and Polars\n- Works with both eager and lazy Polars DataFrames\n\n### Additional Graph Analytics\n- **Shortest Path Analysis**: Find shortest paths between nodes\n- **PageRank**: Calculate node importance scores\n- **Betweenness Centrality**: Identify key bridge nodes\n- **Association Rules**: Discover item relationships and patterns\n\n## Installation\n\n```sh\npip install polars-grouper\n\n# For development:\npython -m venv .venv\nsource .venv/bin/activate\nmaturin develop\n```\n\n## Usage Examples\n\n### Basic Component Grouping\nThe core functionality uses `super_merger` to identify connected components:\n\n```python\nimport polars as pl\nfrom polars_grouper import super_merger\n\ndf = pl.DataFrame({\n \"from\": [\"A\", \"B\", \"C\", \"D\", \"E\", \"F\"],\n \"to\": [\"B\", \"C\", \"A\", \"E\", \"F\", \"D\"],\n \"value\": [1, 2, 3, 4, 5, 6]\n})\n\nresult = super_merger(df, \"from\", \"to\")\nprint(result)\n```\n\n### Weighted Component Grouping\nFor cases where edge weights matter:\n\n```python\nfrom polars_grouper import super_merger_weighted\n\ndf = pl.DataFrame({\n \"from\": [\"A\", \"B\", \"C\", \"D\", \"E\"],\n \"to\": [\"B\", \"C\", \"D\", \"E\", \"A\"],\n \"weight\": [0.9, 0.2, 0.05, 0.8, 0.3]\n})\n\nresult = super_merger_weighted(\n df, \n \"from\", \n \"to\", \n \"weight\",\n weight_threshold=0.3\n)\nprint(result)\n```\n\n### Additional Graph Analytics\n\n#### Shortest Path Analysis\nFind shortest paths between nodes:\n\n```python\nfrom polars_grouper import calculate_shortest_path\n\ndf = pl.DataFrame({\n \"from\": [\"A\", \"A\", \"B\", \"C\"],\n \"to\": [\"B\", \"C\", \"C\", \"D\"],\n \"weight\": [1.0, 2.0, 1.0, 1.5]\n})\n\npaths = df.select(\n calculate_shortest_path(\n pl.col(\"from\"),\n pl.col(\"to\"),\n pl.col(\"weight\"),\n directed=False\n ).alias(\"paths\")\n).unnest(\"paths\")\n```\n\n#### PageRank Calculation\nCalculate node importance:\n\n```python\nfrom polars_grouper import page_rank\n\ndf = pl.DataFrame({\n \"from\": [\"A\", \"A\", \"B\", \"C\", \"D\"],\n \"to\": [\"B\", \"C\", \"C\", \"A\", \"B\"]\n})\n\nrankings = df.select(\n page_rank(\n pl.col(\"from\"),\n pl.col(\"to\"),\n damping_factor=0.85\n ).alias(\"pagerank\")\n).unnest(\"pagerank\")\n```\n\n#### Association Rule Mining\nDiscover item relationships:\n\n```python\nfrom polars_grouper import graph_association_rules\n\ntransactions = pl.DataFrame({\n \"transaction_id\": [1, 1, 1, 2, 2, 3],\n \"item_id\": [\"A\", \"B\", \"C\", \"B\", \"D\", \"A\"],\n \"frequency\": [1, 2, 1, 1, 1, 1]\n})\n\nrules = transactions.select(\n graph_association_rules(\n pl.col(\"transaction_id\"),\n pl.col(\"item_id\"),\n pl.col(\"frequency\"),\n min_support=0.1\n ).alias(\"rules\")\n).unnest(\"rules\")\n```\n\n#### Betweenness Centrality\nIdentify bridge nodes:\n\n```python\nfrom polars_grouper import betweenness_centrality\n\ndf = pl.DataFrame({\n \"from\": [\"A\", \"A\", \"B\", \"C\", \"D\", \"E\"],\n \"to\": [\"B\", \"C\", \"C\", \"D\", \"E\", \"A\"]\n})\n\ncentrality = df.select(\n betweenness_centrality(\n pl.col(\"from\"),\n pl.col(\"to\"),\n normalized=True\n ).alias(\"centrality\")\n).unnest(\"centrality\")\n```\n\n## Performance\n\nThe library is implemented in Rust for high performance:\n- Efficient memory usage\n- Fast computation for large graphs\n- Seamless integration with Polars' lazy evaluation\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "High-performance graph analysis and pattern mining extension for Polars",
"version": "0.3.0",
"project_urls": {
"Bug Tracker": "https://github.com/yourusername/polars-grouper/issues",
"Documentation": "https://github.com/yourusername/polars-grouper#readme",
"Homepage": "https://github.com/yourusername/polars-grouper"
},
"split_keywords": [
"polars",
" graph",
" network",
" clustering",
" data-science"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "668481fd9b5a35668684cfd84b9902283d25aa83857acbaf88237ccaeae24819",
"md5": "7b9ea57acdff57566132dd16367eba8e",
"sha256": "6a2c56eb4621502447268c2d40bfc7696fe291691fe777b257cdda869bfbdde2"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "7b9ea57acdff57566132dd16367eba8e",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3648015,
"upload_time": "2024-10-24T20:37:58",
"upload_time_iso_8601": "2024-10-24T20:37:58.134623Z",
"url": "https://files.pythonhosted.org/packages/66/84/81fd9b5a35668684cfd84b9902283d25aa83857acbaf88237ccaeae24819/polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bf741d5e452b71b7e08615adf3a15d799f593d5cf16382e2bded1d9c50566b8f",
"md5": "30b7e09809e6df11dbbcd3e491485655",
"sha256": "3701fea159f2104d78e8aaad65c2af698275a8b8aa036a8c1d98ef18de06a822"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "30b7e09809e6df11dbbcd3e491485655",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3361939,
"upload_time": "2024-10-24T20:38:00",
"upload_time_iso_8601": "2024-10-24T20:38:00.242486Z",
"url": "https://files.pythonhosted.org/packages/bf/74/1d5e452b71b7e08615adf3a15d799f593d5cf16382e2bded1d9c50566b8f/polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6500dac0055ce0aec0eb7fc6ac7807690ecad84fff23b27e29b0ad48ebd22b14",
"md5": "1b12186f0f62c4351877289d70ae45cc",
"sha256": "29cb97892720b464a9109c31229a4df086665cfcad7390a90c0417fcbfb0b9fd"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "1b12186f0f62c4351877289d70ae45cc",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4406442,
"upload_time": "2024-10-24T20:38:02",
"upload_time_iso_8601": "2024-10-24T20:38:02.547454Z",
"url": "https://files.pythonhosted.org/packages/65/00/dac0055ce0aec0eb7fc6ac7807690ecad84fff23b27e29b0ad48ebd22b14/polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "59a7a76c6f6ac2bdc8ed3f45c98886d00f6eb661f5ba6408268e0ac06f3cbae5",
"md5": "a493241e0a154cb8c5b00c599c08a7be",
"sha256": "447974f42782c9998a49d70e82caf124c39c69094a5d845d4d3778e084409ebc"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "a493241e0a154cb8c5b00c599c08a7be",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 4010386,
"upload_time": "2024-10-24T20:38:04",
"upload_time_iso_8601": "2024-10-24T20:38:04.494456Z",
"url": "https://files.pythonhosted.org/packages/59/a7/a76c6f6ac2bdc8ed3f45c98886d00f6eb661f5ba6408268e0ac06f3cbae5/polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e2c66515349d6d39ea395a04d45c91cbba599cff993cf8991e5aed3c5f9b255a",
"md5": "73ef96b7d906d01585ab1e0cf0ba3ff3",
"sha256": "1fc5e028c0bb1c2e3e5d18d4357da3b06502cfcfd14061d23e15520dcf7caa5e"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0-cp38-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "73ef96b7d906d01585ab1e0cf0ba3ff3",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.8",
"size": 3526180,
"upload_time": "2024-10-24T20:38:06",
"upload_time_iso_8601": "2024-10-24T20:38:06.576653Z",
"url": "https://files.pythonhosted.org/packages/e2/c6/6515349d6d39ea395a04d45c91cbba599cff993cf8991e5aed3c5f9b255a/polars_grouper-0.3.0-cp38-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "98b5a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03",
"md5": "f1a992fddbe2c2f83929a0dd6abccb04",
"sha256": "76707a74ab55cca25b1c5066a293a29ae48baac3cc0db152983ed5230feeb622"
},
"downloads": -1,
"filename": "polars_grouper-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "f1a992fddbe2c2f83929a0dd6abccb04",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 30097,
"upload_time": "2024-10-24T20:38:08",
"upload_time_iso_8601": "2024-10-24T20:38:08.340598Z",
"url": "https://files.pythonhosted.org/packages/98/b5/a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03/polars_grouper-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-24 20:38:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "polars-grouper",
"github_not_found": true,
"lcname": "polars-grouper"
}