miRBench


NamemiRBench JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/katarinagresova/miRBench
SummaryA collection of datasets and predictors for benchmarking miRNA target site prediction algorithms
upload_time2024-10-15 11:37:58
maintainerNone
docs_urlNone
authorKatarina Gresova
requires_pythonNone
licenseMIT
keywords mirna target site prediction benchmarking
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # miRNA target site prediction Benchmarks

## Installation

miRBench package can be easily installed using pip:

```bash
pip install miRBench
```

Default installation allows access to the datasets. To use predictors and encoders, you need to install additional dependencies.

### Dependencies for predictors and encoders

To use miRBench with predictors and encoders, install the following dependencies:
- numpy
- biopython
- viennarna
- torch
- tensorflow
- typing-extensions

To install the miRBench package with all dependencies into a virtual environment, you can use the following commands:

```bash
python3.8 -m venv mirbench_venv
source mirbench_venv/bin/activate
pip install miRBench
pip install numpy==1.24.3 biopython==1.83 viennarna==2.7.0 torch==1.9.0 tensorflow==2.13.1 typing-extensions==4.5.0
```

Note: This instalation is for running predictors on the CPU. If you want to use GPU, you need to install version of torch and tensorflow with GPU support.

## Examples

### Get all available datasets

```python
from miRBench.dataset import list_datasets

list_datasets()
```

```python
['AGO2_CLASH_Hejret2023',
 'AGO2_eCLIP_Klimentova2022',
 'AGO2_eCLIP_Manakov2022']
```

Not all datasets are available with all splits and ratios. To get available splits and ratios, use the `full` option.

```python
list_datasets(full=True)
```

```python
{'AGO2_CLASH_Hejret2023': {'splits': {
      'train': {'ratios': ['10']},
      'test': {'ratios': ['1', '10', '100']}}},
 'AGO2_eCLIP_Klimentova2022': {'splits': {
      'test': {'ratios': ['1', '10', '100']}}},
 'AGO2_eCLIP_Manakov2022': {'splits': {
      'train': {'ratios': ['1', '10', '100']},
      'test': {'ratios': ['1', '10', '100']}}}
}
```

### Get dataset

```python
from miRBench.dataset import get_dataset_df

dataset_name = "AGO2_CLASH_Hejret2023"
df = get_dataset_df(dataset_name, split="test", ratio="1")
df.head()
```

|	| noncodingRNA	| gene |	label |
| -------- | ------- | ------- | ------- |
| 0 |	TCCGAGCCTGGGTCTCCCTCTT	 |GGGTTTAGGGAAGGAGGTTCGGAGACAGGGAGCCAAGGCCTCTGTC... |	1 |
|1 |	TGCGGGGCTAGGGCTAACAGCA	|GCTTCCCAAGTTAGGTTAGTGATGTGAAATGCTCCTGTCCCTGGCC...	| 1 |
| 2 |	CCCACTGCCCCAGGTGCTGCTGG	|TCTTTCCAAAATTGTCCAGCAGCTTGAATGAGGCAGTGACAATTCT...	| 1 |
| 3 |	TGAGGGGCAGAGAGCGAGACTTT	|CAGAACTGGGATTCAAGCGAGGTCTGGCCCCTCAGTCTGTGGCTTT...	| 1 |
| 4	 |CAAAGTGCTGTTCGTGCAGGTAG	|TTTTTTCCCTTAGGACTCTGCACTTTATAGAATGTTGTAAAACAGA...	| 1 |

If you want to get just a path to the dataset, use the `get_dataset_path` function:

```python
from miRBench.dataset import get_dataset_path

dataset_path = get_dataset_path(dataset_name, split="test", ratio="1")
dataset_path
```

```python
/home/user/.miRBench/datasets/13909173/AGO2_CLASH_Hejret2023/1/test/dataset.tsv
```

### Get all available tools

```python
from miRBench.predictor import list_predictors

list_predictors()
```
```python
['CnnMirTarget_Zheng2020',
 'RNACofold',
 'miRNA_CNN_Hejret2023',
 'miRBind_Klimentova2022',
 'TargetNet_Min2021',
 'Seed8mer',
 'Seed7mer',
 'Seed6mer',
 'Seed6merBulgeOrMismatch',
 'TargetScanCnn_McGeary2019',
 'InteractionAwareModel_Yang2024']
```

### Encode dataset

```python
from miRBench.encoder import get_encoder

tool = 'miRBind_Klimentova2022'
encoder = get_encoder(tool)

input = encoder(df)
```

### Get predictions

```python
from miRBench.predictor import get_predictor

predictor = get_predictor(tool)

predictions = predictor(input)
predictions[:10]
```

```python
array([0.6899161 , 0.15220629, 0.07301956, 0.43757868, 0.34360734,
       0.20519172, 0.0955029 , 0.79298246, 0.14150576, 0.05329492],
      dtype=float32)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/katarinagresova/miRBench",
    "name": "miRBench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "miRNA, target site prediction, benchmarking",
    "author": "Katarina Gresova",
    "author_email": "gresova11@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c8/9e/0fa1325616f6f9a6655e8d5c75ee5234d0309bf6ae7208093ea3bc01dcf7/mirbench-1.0.0.tar.gz",
    "platform": null,
    "description": "# miRNA target site prediction Benchmarks\n\n## Installation\n\nmiRBench package can be easily installed using pip:\n\n```bash\npip install miRBench\n```\n\nDefault installation allows access to the datasets. To use predictors and encoders, you need to install additional dependencies.\n\n### Dependencies for predictors and encoders\n\nTo use miRBench with predictors and encoders, install the following dependencies:\n- numpy\n- biopython\n- viennarna\n- torch\n- tensorflow\n- typing-extensions\n\nTo install the miRBench package with all dependencies into a virtual environment, you can use the following commands:\n\n```bash\npython3.8 -m venv mirbench_venv\nsource mirbench_venv/bin/activate\npip install miRBench\npip install numpy==1.24.3 biopython==1.83 viennarna==2.7.0 torch==1.9.0 tensorflow==2.13.1 typing-extensions==4.5.0\n```\n\nNote: This instalation is for running predictors on the CPU. If you want to use GPU, you need to install version of torch and tensorflow with GPU support.\n\n## Examples\n\n### Get all available datasets\n\n```python\nfrom miRBench.dataset import list_datasets\n\nlist_datasets()\n```\n\n```python\n['AGO2_CLASH_Hejret2023',\n 'AGO2_eCLIP_Klimentova2022',\n 'AGO2_eCLIP_Manakov2022']\n```\n\nNot all datasets are available with all splits and ratios. To get available splits and ratios, use the `full` option.\n\n```python\nlist_datasets(full=True)\n```\n\n```python\n{'AGO2_CLASH_Hejret2023': {'splits': {\n      'train': {'ratios': ['10']},\n      'test': {'ratios': ['1', '10', '100']}}},\n 'AGO2_eCLIP_Klimentova2022': {'splits': {\n      'test': {'ratios': ['1', '10', '100']}}},\n 'AGO2_eCLIP_Manakov2022': {'splits': {\n      'train': {'ratios': ['1', '10', '100']},\n      'test': {'ratios': ['1', '10', '100']}}}\n}\n```\n\n### Get dataset\n\n```python\nfrom miRBench.dataset import get_dataset_df\n\ndataset_name = \"AGO2_CLASH_Hejret2023\"\ndf = get_dataset_df(dataset_name, split=\"test\", ratio=\"1\")\ndf.head()\n```\n\n|\t| noncodingRNA\t| gene |\tlabel |\n| -------- | ------- | ------- | ------- |\n| 0 |\tTCCGAGCCTGGGTCTCCCTCTT\t |GGGTTTAGGGAAGGAGGTTCGGAGACAGGGAGCCAAGGCCTCTGTC... |\t1 |\n|1 |\tTGCGGGGCTAGGGCTAACAGCA\t|GCTTCCCAAGTTAGGTTAGTGATGTGAAATGCTCCTGTCCCTGGCC...\t| 1 |\n| 2 |\tCCCACTGCCCCAGGTGCTGCTGG\t|TCTTTCCAAAATTGTCCAGCAGCTTGAATGAGGCAGTGACAATTCT...\t| 1 |\n| 3 |\tTGAGGGGCAGAGAGCGAGACTTT\t|CAGAACTGGGATTCAAGCGAGGTCTGGCCCCTCAGTCTGTGGCTTT...\t| 1 |\n| 4\t |CAAAGTGCTGTTCGTGCAGGTAG\t|TTTTTTCCCTTAGGACTCTGCACTTTATAGAATGTTGTAAAACAGA...\t| 1 |\n\nIf you want to get just a path to the dataset, use the `get_dataset_path` function:\n\n```python\nfrom miRBench.dataset import get_dataset_path\n\ndataset_path = get_dataset_path(dataset_name, split=\"test\", ratio=\"1\")\ndataset_path\n```\n\n```python\n/home/user/.miRBench/datasets/13909173/AGO2_CLASH_Hejret2023/1/test/dataset.tsv\n```\n\n### Get all available tools\n\n```python\nfrom miRBench.predictor import list_predictors\n\nlist_predictors()\n```\n```python\n['CnnMirTarget_Zheng2020',\n 'RNACofold',\n 'miRNA_CNN_Hejret2023',\n 'miRBind_Klimentova2022',\n 'TargetNet_Min2021',\n 'Seed8mer',\n 'Seed7mer',\n 'Seed6mer',\n 'Seed6merBulgeOrMismatch',\n 'TargetScanCnn_McGeary2019',\n 'InteractionAwareModel_Yang2024']\n```\n\n### Encode dataset\n\n```python\nfrom miRBench.encoder import get_encoder\n\ntool = 'miRBind_Klimentova2022'\nencoder = get_encoder(tool)\n\ninput = encoder(df)\n```\n\n### Get predictions\n\n```python\nfrom miRBench.predictor import get_predictor\n\npredictor = get_predictor(tool)\n\npredictions = predictor(input)\npredictions[:10]\n```\n\n```python\narray([0.6899161 , 0.15220629, 0.07301956, 0.43757868, 0.34360734,\n       0.20519172, 0.0955029 , 0.79298246, 0.14150576, 0.05329492],\n      dtype=float32)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A collection of datasets and predictors for benchmarking miRNA target site prediction algorithms",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/katarinagresova/miRBench"
    },
    "split_keywords": [
        "mirna",
        " target site prediction",
        " benchmarking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c89e0fa1325616f6f9a6655e8d5c75ee5234d0309bf6ae7208093ea3bc01dcf7",
                "md5": "de923b795f7d9b3b0d6342664e997dc1",
                "sha256": "2484e5b1dd86dcc39bfae1736125aca7452e659a4d8d3cfacb6f0091ab758d7d"
            },
            "downloads": -1,
            "filename": "mirbench-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "de923b795f7d9b3b0d6342664e997dc1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15995,
            "upload_time": "2024-10-15T11:37:58",
            "upload_time_iso_8601": "2024-10-15T11:37:58.393538Z",
            "url": "https://files.pythonhosted.org/packages/c8/9e/0fa1325616f6f9a6655e8d5c75ee5234d0309bf6ae7208093ea3bc01dcf7/mirbench-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 11:37:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "katarinagresova",
    "github_project": "miRBench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mirbench"
}
        
Elapsed time: 1.99516s