Name | cpgdata JSON |
Version |
0.4.0
JSON |
| download |
home_page | None |
Summary | Cell painting gallery data handling and validation |
upload_time | 2024-05-06 03:17:32 |
maintainer | None |
docs_url | None |
author | Ankur Kumar |
requires_python | <4.0,>=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Cell painting gallery data handling and validation
## Getting started
### Install `cpgdata` package
```bash
pip install cpgdata
```
### Sync pre-generated index files
```bash
cpg index sync -o "path to save index files"
```
### Example of using the index for filtering files to download from the Cell painting gallery
```python
from pathlib import Path
from pprint import pprint
import polars as pl
from cpgdata.utils import download_s3_files, parallel
index_dir = Path("path to dir containing index files")
index_files = [file for file in index_dir.glob("*.parquet")]
df = pl.scan_parquet(index_files)
df = (
df
.filter(pl.col("dataset_id").eq("cpg0016-jump"))
.filter(pl.col("source_id").eq("source_4"))
.filter(pl.col("leaf_node").str.contains("Cells.csv"))
.select(pl.col("key"))
.collect()
)
# print first 10 results
pprint(df.to_dicts()[0:10])
# Download filtered files
download_keys = list(df.to_dict()["key"])
parallel(download_keys, download_s3_files, ["cellpainting-gallery", Path("path to save downloaded files")], jobs=20)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "cpgdata",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Ankur Kumar",
"author_email": "ank@leoank.me",
"download_url": "https://files.pythonhosted.org/packages/8a/e2/ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4/cpgdata-0.4.0.tar.gz",
"platform": null,
"description": "# Cell painting gallery data handling and validation\n\n## Getting started\n\n\n### Install `cpgdata` package\n\n```bash\npip install cpgdata\n```\n\n### Sync pre-generated index files\n\n```bash\ncpg index sync -o \"path to save index files\"\n```\n\n### Example of using the index for filtering files to download from the Cell painting gallery\n\n```python\nfrom pathlib import Path\nfrom pprint import pprint\n\nimport polars as pl\nfrom cpgdata.utils import download_s3_files, parallel\n\nindex_dir = Path(\"path to dir containing index files\")\nindex_files = [file for file in index_dir.glob(\"*.parquet\")]\ndf = pl.scan_parquet(index_files)\n\ndf = (\n df\n .filter(pl.col(\"dataset_id\").eq(\"cpg0016-jump\"))\n .filter(pl.col(\"source_id\").eq(\"source_4\"))\n .filter(pl.col(\"leaf_node\").str.contains(\"Cells.csv\"))\n .select(pl.col(\"key\"))\n .collect()\n)\n\n# print first 10 results\npprint(df.to_dicts()[0:10])\n\n# Download filtered files\ndownload_keys = list(df.to_dict()[\"key\"])\nparallel(download_keys, download_s3_files, [\"cellpainting-gallery\", Path(\"path to save downloaded files\")], jobs=20)\n\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Cell painting gallery data handling and validation",
"version": "0.4.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b58067a051811e151eed39f61bda4249bf36946c93e88ea9ac72c629a9d58a91",
"md5": "854c8e2a3e8b498e4e98695047ca9178",
"sha256": "466eea7c3727b8e864bff51f0ff4155988ee755d964fe31dfa31bc8692ad060f"
},
"downloads": -1,
"filename": "cpgdata-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "854c8e2a3e8b498e4e98695047ca9178",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 15051,
"upload_time": "2024-05-06T03:17:31",
"upload_time_iso_8601": "2024-05-06T03:17:31.137899Z",
"url": "https://files.pythonhosted.org/packages/b5/80/67a051811e151eed39f61bda4249bf36946c93e88ea9ac72c629a9d58a91/cpgdata-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8ae2ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4",
"md5": "f2ec666fc79f8b973b234df673c456db",
"sha256": "9c20f38e71170f41ece257d2f4d3fd0db1b4972f3863c3432026efa6e303d5c5"
},
"downloads": -1,
"filename": "cpgdata-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "f2ec666fc79f8b973b234df673c456db",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 11409,
"upload_time": "2024-05-06T03:17:32",
"upload_time_iso_8601": "2024-05-06T03:17:32.373420Z",
"url": "https://files.pythonhosted.org/packages/8a/e2/ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4/cpgdata-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-06 03:17:32",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "cpgdata"
}