Name | copairs JSON |
Version |
0.4.1
JSON |
| download |
home_page | None |
Summary | Find pairs and compute metrics between them |
upload_time | 2024-09-05 22:03:08 |
maintainer | None |
docs_url | None |
author | John Arevalo |
requires_python | <3.12,>=3.9 |
license | LICENSE.txt |
keywords |
pairwise
replication
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# copairs
Find pairs and compute metrics between them.
## Installation
```bash
pip install git+https://github.com/cytomining/copairs.git@v0.4.1
```
## Usage
### Data
Say you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`,
each plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well
has one or more labels (`t1, t2, t3, t4`) assigned.
```python
import pandas as pd
import random
random.seed(0)
n_samples = 20
dframe = pd.DataFrame({
'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],
'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],
'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]
})
dframe = dframe.drop_duplicates()
dframe = dframe.sort_values(by=['plate', 'well', 'label'])
dframe = dframe.reset_index(drop=True)
```
| | plate | well | label |
|---:|:--------|:-------|:--------|
| 0 | p1 | w2 | t4 |
| 1 | p1 | w3 | t2 |
| 2 | p1 | w3 | t4 |
| 3 | p1 | w4 | t1 |
| 4 | p1 | w4 | t3 |
| 5 | p2 | w1 | t1 |
| 6 | p2 | w2 | t1 |
| 7 | p2 | w3 | t1 |
| 8 | p2 | w3 | t2 |
| 9 | p2 | w3 | t3 |
| 10 | p2 | w4 | t2 |
| 11 | p2 | w5 | t1 |
| 12 | p2 | w5 | t3 |
| 13 | p3 | w1 | t3 |
| 14 | p3 | w1 | t4 |
| 15 | p3 | w4 | t2 |
| 16 | p3 | w5 | t2 |
| 17 | p3 | w5 | t4 |
### Getting valid pairs
To get pairs of samples that share the same `label` but comes from different
`plate`s at different `well` positions:
```python
from copairs import Matcher
matcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)
pairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])
```
`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid
pairs for every unique value of `labels`
```
{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],
't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],
't1': [(3, 11), (3, 5), (3, 6), (3, 7)],
't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}
```
### Getting valid pairs from a multilabel column
For eficiency reasons, you may not want to have duplicated rows. You can
group all the labels in a single row and use `MatcherMultilabel` to find the
corresponding pairs:
```python
dframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()
```
| | plate | well | label |
|---:|:--------|:-------|:-------------------|
| 0 | p1 | w2 | ['t4'] |
| 1 | p1 | w3 | ['t2', 't4'] |
| 2 | p1 | w4 | ['t1', 't3'] |
| 3 | p2 | w1 | ['t1'] |
| 4 | p2 | w2 | ['t1'] |
| 5 | p2 | w3 | ['t1', 't2', 't3'] |
| 6 | p2 | w4 | ['t2'] |
| 7 | p2 | w5 | ['t1', 't3'] |
| 8 | p3 | w1 | ['t3', 't4'] |
| 9 | p3 | w4 | ['t2'] |
| 10 | p3 | w5 | ['t2', 't4'] |
```python
from copairs import MatcherMultilabel
matcher_multi = MatcherMultilabel(dframe_multi,
columns=['plate', 'well', 'label'],
multilabel_col='label',
seed=0)
pairs_multi = matcher_multi.get_all_pairs(sameby=['label'],
diffby=['plate', 'well'])
```
`pairs_multi` is also a `label_id: pairs` dictionary with the same
structure discussed before:
```
{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],
't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],
't1': [(2, 7), (2, 3), (2, 4), (2, 5)],
't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "copairs",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.9",
"maintainer_email": null,
"keywords": "pairwise, replication",
"author": "John Arevalo",
"author_email": "johnarevalo@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cf/72/012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342/copairs-0.4.1.tar.gz",
"platform": null,
"description": " # copairs\n\nFind pairs and compute metrics between them.\n\n## Installation\n\n```bash\npip install git+https://github.com/cytomining/copairs.git@v0.4.1\n```\n\n## Usage\n\n### Data\n\nSay you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`,\neach plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well \nhas one or more labels (`t1, t2, t3, t4`) assigned.\n\n```python\nimport pandas as pd\nimport random\n\nrandom.seed(0)\nn_samples = 20\ndframe = pd.DataFrame({\n 'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],\n 'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],\n 'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]\n})\ndframe = dframe.drop_duplicates()\ndframe = dframe.sort_values(by=['plate', 'well', 'label'])\ndframe = dframe.reset_index(drop=True)\n```\n\n| | plate | well | label |\n|---:|:--------|:-------|:--------|\n| 0 | p1 | w2 | t4 |\n| 1 | p1 | w3 | t2 |\n| 2 | p1 | w3 | t4 |\n| 3 | p1 | w4 | t1 |\n| 4 | p1 | w4 | t3 |\n| 5 | p2 | w1 | t1 |\n| 6 | p2 | w2 | t1 |\n| 7 | p2 | w3 | t1 |\n| 8 | p2 | w3 | t2 |\n| 9 | p2 | w3 | t3 |\n| 10 | p2 | w4 | t2 |\n| 11 | p2 | w5 | t1 |\n| 12 | p2 | w5 | t3 |\n| 13 | p3 | w1 | t3 |\n| 14 | p3 | w1 | t4 |\n| 15 | p3 | w4 | t2 |\n| 16 | p3 | w5 | t2 |\n| 17 | p3 | w5 | t4 |\n\n### Getting valid pairs\n\nTo get pairs of samples that share the same `label` but comes from different\n`plate`s at different `well` positions: \n\n```python\nfrom copairs import Matcher\nmatcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)\npairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])\n```\n\n`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid\npairs for every unique value of `labels`\n\n```\n{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],\n 't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],\n 't1': [(3, 11), (3, 5), (3, 6), (3, 7)],\n 't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}\n```\n\n### Getting valid pairs from a multilabel column\n\nFor eficiency reasons, you may not want to have duplicated rows. You can\ngroup all the labels in a single row and use `MatcherMultilabel` to find the\ncorresponding pairs:\n\n```python\ndframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()\n```\n\n| | plate | well | label |\n|---:|:--------|:-------|:-------------------|\n| 0 | p1 | w2 | ['t4'] |\n| 1 | p1 | w3 | ['t2', 't4'] |\n| 2 | p1 | w4 | ['t1', 't3'] |\n| 3 | p2 | w1 | ['t1'] |\n| 4 | p2 | w2 | ['t1'] |\n| 5 | p2 | w3 | ['t1', 't2', 't3'] |\n| 6 | p2 | w4 | ['t2'] |\n| 7 | p2 | w5 | ['t1', 't3'] |\n| 8 | p3 | w1 | ['t3', 't4'] |\n| 9 | p3 | w4 | ['t2'] |\n| 10 | p3 | w5 | ['t2', 't4'] |\n\n```python\nfrom copairs import MatcherMultilabel\nmatcher_multi = MatcherMultilabel(dframe_multi,\n columns=['plate', 'well', 'label'],\n multilabel_col='label',\n seed=0)\npairs_multi = matcher_multi.get_all_pairs(sameby=['label'],\n diffby=['plate', 'well'])\n```\n\n`pairs_multi` is also a `label_id: pairs` dictionary with the same\nstructure discussed before:\n\n```\n{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],\n 't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],\n 't1': [(2, 7), (2, 3), (2, 4), (2, 5)],\n 't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}\n```\n",
"bugtrack_url": null,
"license": "LICENSE.txt",
"summary": "Find pairs and compute metrics between them",
"version": "0.4.1",
"project_urls": null,
"split_keywords": [
"pairwise",
" replication"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8c643e6eefd6ead522c1c560c6605128204cd0c1b75bf1272c2a3b8b88756a8b",
"md5": "bc014402ddd3d23e90d1e083677a8374",
"sha256": "272a74153e522cdaab814221e84749cab139ad30f6c93814681fb662c0e3295d"
},
"downloads": -1,
"filename": "copairs-0.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bc014402ddd3d23e90d1e083677a8374",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.9",
"size": 18317,
"upload_time": "2024-09-05T22:03:06",
"upload_time_iso_8601": "2024-09-05T22:03:06.987726Z",
"url": "https://files.pythonhosted.org/packages/8c/64/3e6eefd6ead522c1c560c6605128204cd0c1b75bf1272c2a3b8b88756a8b/copairs-0.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cf72012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342",
"md5": "74500c13f91113c2c17e8a7a29b59a8a",
"sha256": "2d5743900e5826811cae7fe7d194f3da899880be41a12a0cba78389874634c10"
},
"downloads": -1,
"filename": "copairs-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "74500c13f91113c2c17e8a7a29b59a8a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.9",
"size": 15775,
"upload_time": "2024-09-05T22:03:08",
"upload_time_iso_8601": "2024-09-05T22:03:08.761458Z",
"url": "https://files.pythonhosted.org/packages/cf/72/012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342/copairs-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-05 22:03:08",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "copairs"
}