cpgdata


Namecpgdata JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryCell painting gallery data handling and validation
upload_time2024-05-06 03:17:32
maintainerNone
docs_urlNone
authorAnkur Kumar
requires_python<4.0,>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cell painting gallery data handling and validation

## Getting started


### Install `cpgdata` package

```bash
pip install cpgdata
```

### Sync pre-generated index files

```bash
cpg index sync -o "path to save index files"
```

### Example of using the index for filtering files to download from the Cell painting gallery

```python
from pathlib import Path
from pprint import pprint

import polars as pl
from cpgdata.utils import download_s3_files, parallel

index_dir = Path("path to dir containing index files")
index_files = [file for file in index_dir.glob("*.parquet")]
df = pl.scan_parquet(index_files)

df = (
    df
    .filter(pl.col("dataset_id").eq("cpg0016-jump"))
    .filter(pl.col("source_id").eq("source_4"))
    .filter(pl.col("leaf_node").str.contains("Cells.csv"))
    .select(pl.col("key"))
    .collect()
)

# print first 10 results
pprint(df.to_dicts()[0:10])

# Download filtered files
download_keys = list(df.to_dict()["key"])
parallel(download_keys, download_s3_files, ["cellpainting-gallery", Path("path to save downloaded files")], jobs=20)

```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cpgdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ankur Kumar",
    "author_email": "ank@leoank.me",
    "download_url": "https://files.pythonhosted.org/packages/8a/e2/ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4/cpgdata-0.4.0.tar.gz",
    "platform": null,
    "description": "# Cell painting gallery data handling and validation\n\n## Getting started\n\n\n### Install `cpgdata` package\n\n```bash\npip install cpgdata\n```\n\n### Sync pre-generated index files\n\n```bash\ncpg index sync -o \"path to save index files\"\n```\n\n### Example of using the index for filtering files to download from the Cell painting gallery\n\n```python\nfrom pathlib import Path\nfrom pprint import pprint\n\nimport polars as pl\nfrom cpgdata.utils import download_s3_files, parallel\n\nindex_dir = Path(\"path to dir containing index files\")\nindex_files = [file for file in index_dir.glob(\"*.parquet\")]\ndf = pl.scan_parquet(index_files)\n\ndf = (\n    df\n    .filter(pl.col(\"dataset_id\").eq(\"cpg0016-jump\"))\n    .filter(pl.col(\"source_id\").eq(\"source_4\"))\n    .filter(pl.col(\"leaf_node\").str.contains(\"Cells.csv\"))\n    .select(pl.col(\"key\"))\n    .collect()\n)\n\n# print first 10 results\npprint(df.to_dicts()[0:10])\n\n# Download filtered files\ndownload_keys = list(df.to_dict()[\"key\"])\nparallel(download_keys, download_s3_files, [\"cellpainting-gallery\", Path(\"path to save downloaded files\")], jobs=20)\n\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Cell painting gallery data handling and validation",
    "version": "0.4.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b58067a051811e151eed39f61bda4249bf36946c93e88ea9ac72c629a9d58a91",
                "md5": "854c8e2a3e8b498e4e98695047ca9178",
                "sha256": "466eea7c3727b8e864bff51f0ff4155988ee755d964fe31dfa31bc8692ad060f"
            },
            "downloads": -1,
            "filename": "cpgdata-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "854c8e2a3e8b498e4e98695047ca9178",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 15051,
            "upload_time": "2024-05-06T03:17:31",
            "upload_time_iso_8601": "2024-05-06T03:17:31.137899Z",
            "url": "https://files.pythonhosted.org/packages/b5/80/67a051811e151eed39f61bda4249bf36946c93e88ea9ac72c629a9d58a91/cpgdata-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ae2ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4",
                "md5": "f2ec666fc79f8b973b234df673c456db",
                "sha256": "9c20f38e71170f41ece257d2f4d3fd0db1b4972f3863c3432026efa6e303d5c5"
            },
            "downloads": -1,
            "filename": "cpgdata-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f2ec666fc79f8b973b234df673c456db",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 11409,
            "upload_time": "2024-05-06T03:17:32",
            "upload_time_iso_8601": "2024-05-06T03:17:32.373420Z",
            "url": "https://files.pythonhosted.org/packages/8a/e2/ebe089272c70bbbcc83d8b8a1bed56b1292f903cfa036c743221369c02b4/cpgdata-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-06 03:17:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cpgdata"
}
        
Elapsed time: 4.97764s