copairs


Namecopairs JSON
Version 0.4.1 PyPI version JSON
download
home_pageNone
SummaryFind pairs and compute metrics between them
upload_time2024-09-05 22:03:08
maintainerNone
docs_urlNone
authorJohn Arevalo
requires_python<3.12,>=3.9
licenseLICENSE.txt
keywords pairwise replication
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
             # copairs

Find pairs and compute metrics between them.

## Installation

```bash
pip install git+https://github.com/cytomining/copairs.git@v0.4.1
```

## Usage

### Data

Say you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`,
each plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well 
has one or more labels (`t1, t2, t3, t4`) assigned.

```python
import pandas as pd
import random

random.seed(0)
n_samples = 20
dframe = pd.DataFrame({
    'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],
    'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],
    'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]
})
dframe = dframe.drop_duplicates()
dframe = dframe.sort_values(by=['plate', 'well', 'label'])
dframe = dframe.reset_index(drop=True)
```

|    | plate   | well   | label   |
|---:|:--------|:-------|:--------|
|  0 | p1      | w2     | t4      |
|  1 | p1      | w3     | t2      |
|  2 | p1      | w3     | t4      |
|  3 | p1      | w4     | t1      |
|  4 | p1      | w4     | t3      |
|  5 | p2      | w1     | t1      |
|  6 | p2      | w2     | t1      |
|  7 | p2      | w3     | t1      |
|  8 | p2      | w3     | t2      |
|  9 | p2      | w3     | t3      |
| 10 | p2      | w4     | t2      |
| 11 | p2      | w5     | t1      |
| 12 | p2      | w5     | t3      |
| 13 | p3      | w1     | t3      |
| 14 | p3      | w1     | t4      |
| 15 | p3      | w4     | t2      |
| 16 | p3      | w5     | t2      |
| 17 | p3      | w5     | t4      |

### Getting valid pairs

To get pairs of samples that share the same `label` but comes from different
`plate`s at different `well` positions: 

```python
from copairs import Matcher
matcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)
pairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])
```

`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid
pairs for every unique value of `labels`

```
{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],
 't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],
 't1': [(3, 11), (3, 5), (3, 6), (3, 7)],
 't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}
```

### Getting valid pairs from a multilabel column

For eficiency reasons, you may not want to have duplicated rows. You can
group all the labels in a single row and use `MatcherMultilabel` to find the
corresponding pairs:

```python
dframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()
```

|    | plate   | well   | label              |
|---:|:--------|:-------|:-------------------|
|  0 | p1      | w2     | ['t4']             |
|  1 | p1      | w3     | ['t2', 't4']       |
|  2 | p1      | w4     | ['t1', 't3']       |
|  3 | p2      | w1     | ['t1']             |
|  4 | p2      | w2     | ['t1']             |
|  5 | p2      | w3     | ['t1', 't2', 't3'] |
|  6 | p2      | w4     | ['t2']             |
|  7 | p2      | w5     | ['t1', 't3']       |
|  8 | p3      | w1     | ['t3', 't4']       |
|  9 | p3      | w4     | ['t2']             |
| 10 | p3      | w5     | ['t2', 't4']       |

```python
from copairs import MatcherMultilabel
matcher_multi = MatcherMultilabel(dframe_multi,
                                  columns=['plate', 'well', 'label'],
                                  multilabel_col='label',
                                  seed=0)
pairs_multi = matcher_multi.get_all_pairs(sameby=['label'],
                                          diffby=['plate', 'well'])
```

`pairs_multi` is also a `label_id: pairs` dictionary with the same
structure discussed before:

```
{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],
 't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],
 't1': [(2, 7), (2, 3), (2, 4), (2, 5)],
 't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "copairs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "pairwise, replication",
    "author": "John Arevalo",
    "author_email": "johnarevalo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cf/72/012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342/copairs-0.4.1.tar.gz",
    "platform": null,
    "description": " # copairs\n\nFind pairs and compute metrics between them.\n\n## Installation\n\n```bash\npip install git+https://github.com/cytomining/copairs.git@v0.4.1\n```\n\n## Usage\n\n### Data\n\nSay you have a dataset with 20 samples taken in 3 plates `p1, p2, p3`,\neach plate is composed of 5 wells `w1, w2, w3, w4, w5`, and each well \nhas one or more labels (`t1, t2, t3, t4`) assigned.\n\n```python\nimport pandas as pd\nimport random\n\nrandom.seed(0)\nn_samples = 20\ndframe = pd.DataFrame({\n    'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],\n    'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],\n    'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]\n})\ndframe = dframe.drop_duplicates()\ndframe = dframe.sort_values(by=['plate', 'well', 'label'])\ndframe = dframe.reset_index(drop=True)\n```\n\n|    | plate   | well   | label   |\n|---:|:--------|:-------|:--------|\n|  0 | p1      | w2     | t4      |\n|  1 | p1      | w3     | t2      |\n|  2 | p1      | w3     | t4      |\n|  3 | p1      | w4     | t1      |\n|  4 | p1      | w4     | t3      |\n|  5 | p2      | w1     | t1      |\n|  6 | p2      | w2     | t1      |\n|  7 | p2      | w3     | t1      |\n|  8 | p2      | w3     | t2      |\n|  9 | p2      | w3     | t3      |\n| 10 | p2      | w4     | t2      |\n| 11 | p2      | w5     | t1      |\n| 12 | p2      | w5     | t3      |\n| 13 | p3      | w1     | t3      |\n| 14 | p3      | w1     | t4      |\n| 15 | p3      | w4     | t2      |\n| 16 | p3      | w5     | t2      |\n| 17 | p3      | w5     | t4      |\n\n### Getting valid pairs\n\nTo get pairs of samples that share the same `label` but comes from different\n`plate`s at different `well` positions: \n\n```python\nfrom copairs import Matcher\nmatcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)\npairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])\n```\n\n`pairs_dict` is a `label_id: pairs` dictionary containing the list of valid\npairs for every unique value of `labels`\n\n```\n{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],\n 't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],\n 't1': [(3, 11), (3, 5), (3, 6), (3, 7)],\n 't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}\n```\n\n### Getting valid pairs from a multilabel column\n\nFor eficiency reasons, you may not want to have duplicated rows. You can\ngroup all the labels in a single row and use `MatcherMultilabel` to find the\ncorresponding pairs:\n\n```python\ndframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()\n```\n\n|    | plate   | well   | label              |\n|---:|:--------|:-------|:-------------------|\n|  0 | p1      | w2     | ['t4']             |\n|  1 | p1      | w3     | ['t2', 't4']       |\n|  2 | p1      | w4     | ['t1', 't3']       |\n|  3 | p2      | w1     | ['t1']             |\n|  4 | p2      | w2     | ['t1']             |\n|  5 | p2      | w3     | ['t1', 't2', 't3'] |\n|  6 | p2      | w4     | ['t2']             |\n|  7 | p2      | w5     | ['t1', 't3']       |\n|  8 | p3      | w1     | ['t3', 't4']       |\n|  9 | p3      | w4     | ['t2']             |\n| 10 | p3      | w5     | ['t2', 't4']       |\n\n```python\nfrom copairs import MatcherMultilabel\nmatcher_multi = MatcherMultilabel(dframe_multi,\n                                  columns=['plate', 'well', 'label'],\n                                  multilabel_col='label',\n                                  seed=0)\npairs_multi = matcher_multi.get_all_pairs(sameby=['label'],\n                                          diffby=['plate', 'well'])\n```\n\n`pairs_multi` is also a `label_id: pairs` dictionary with the same\nstructure discussed before:\n\n```\n{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],\n 't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],\n 't1': [(2, 7), (2, 3), (2, 4), (2, 5)],\n 't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}\n```\n",
    "bugtrack_url": null,
    "license": "LICENSE.txt",
    "summary": "Find pairs and compute metrics between them",
    "version": "0.4.1",
    "project_urls": null,
    "split_keywords": [
        "pairwise",
        " replication"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c643e6eefd6ead522c1c560c6605128204cd0c1b75bf1272c2a3b8b88756a8b",
                "md5": "bc014402ddd3d23e90d1e083677a8374",
                "sha256": "272a74153e522cdaab814221e84749cab139ad30f6c93814681fb662c0e3295d"
            },
            "downloads": -1,
            "filename": "copairs-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bc014402ddd3d23e90d1e083677a8374",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.9",
            "size": 18317,
            "upload_time": "2024-09-05T22:03:06",
            "upload_time_iso_8601": "2024-09-05T22:03:06.987726Z",
            "url": "https://files.pythonhosted.org/packages/8c/64/3e6eefd6ead522c1c560c6605128204cd0c1b75bf1272c2a3b8b88756a8b/copairs-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cf72012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342",
                "md5": "74500c13f91113c2c17e8a7a29b59a8a",
                "sha256": "2d5743900e5826811cae7fe7d194f3da899880be41a12a0cba78389874634c10"
            },
            "downloads": -1,
            "filename": "copairs-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "74500c13f91113c2c17e8a7a29b59a8a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.9",
            "size": 15775,
            "upload_time": "2024-09-05T22:03:08",
            "upload_time_iso_8601": "2024-09-05T22:03:08.761458Z",
            "url": "https://files.pythonhosted.org/packages/cf/72/012c3d39ab4ec48f278bbcf66eb920d42c6ad741fdffc55684fed364c342/copairs-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-05 22:03:08",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "copairs"
}
        
Elapsed time: 0.96103s