# miRNA target site prediction Benchmarks
## Installation
```bash
pip install miRBench
```
## Examples
### Get all available datasets
```python
import miRBench
miRBench.dataset.list_datasets()
```
```python
['AGO2_CLASH_Hejret2023',
'AGO2_eCLIP_Klimentova2022',
'AGO2_eCLIP_Manakov2022']
```
Not all datasets are available with all splits and ratios. To get available splits and ratios, use the `full` option.
```python
miRBench.dataset.list_datasets(full=True)
```
```python
{'AGO2_CLASH_Hejret2023': {'splits': {
'train': {'ratios': ['10']},
'test': {'ratios': ['1', '10', '100']}}},
'AGO2_eCLIP_Klimentova2022': {'splits': {
'test': {'ratios': ['1', '10', '100']}}},
'AGO2_eCLIP_Manakov2022': {'splits': {
'train': {'ratios': ['1', '10', '100']},
'test': {'ratios': ['1', '10', '100']}}}
}
```
### Get dataset
```python
dataset_name = "AGO2_CLASH_Hejret2023"
df = miRBench.dataset.get_dataset_df(dataset_name, split="test", ratio="1")
df.head()
```
| | noncodingRNA | gene | label |
| -------- | ------- | ------- | ------- |
| 0 | TCCGAGCCTGGGTCTCCCTCTT |GGGTTTAGGGAAGGAGGTTCGGAGACAGGGAGCCAAGGCCTCTGTC... | 1 |
|1 | TGCGGGGCTAGGGCTAACAGCA |GCTTCCCAAGTTAGGTTAGTGATGTGAAATGCTCCTGTCCCTGGCC... | 1 |
| 2 | CCCACTGCCCCAGGTGCTGCTGG |TCTTTCCAAAATTGTCCAGCAGCTTGAATGAGGCAGTGACAATTCT... | 1 |
| 3 | TGAGGGGCAGAGAGCGAGACTTT |CAGAACTGGGATTCAAGCGAGGTCTGGCCCCTCAGTCTGTGGCTTT... | 1 |
| 4 |CAAAGTGCTGTTCGTGCAGGTAG |TTTTTTCCCTTAGGACTCTGCACTTTATAGAATGTTGTAAAACAGA... | 1 |
Data will be downloaded to `$HOME / ".miRBench" / "datasets"` directory, under separate subdirectories for each dataset.
### Get all available tools
```python
miRBench.predictor.list_predictors()
```
```python
['CnnMirTarget_Zheng2020',
'RNACofold',
'miRNA_CNN_Hejret2023',
'miRBind_Klimentova2022',
'TargetNet_Min2021',
'Seed8mer',
'Seed7mer',
'Seed6mer',
'Seed6merBulgeOrMismatch',
'TargetScanCnn_McGeary2019',
'InteractionAwareModel_Yang2024']
```
### Encode dataset
```python
tool = 'miRBind_Klimentova2022'
encoder = miRBench.encoder.get_encoder(tool)
input = encoder(df)
```
### Get predictions
```python
predictor = miRBench.predictor.get_predictor(tool)
predictions = predictor(input)
predictions[:10]
```
```python
array([0.6899161 , 0.15220629, 0.07301956, 0.43757868, 0.34360734,
0.20519172, 0.0955029 , 0.79298246, 0.14150576, 0.05329492],
dtype=float32)
```
## Benchmark all tools on all datasets
```bash
python benchmark_all.py OUTPUT_FOLDER_PATH
```
The script will run all tools on all datasets and will produce a file with suffix `_predictions.tsv` for each dataset. Predictions from every tool will be saved in separate columns.
Raw data
{
"_id": null,
"home_page": "https://github.com/katarinagresova/miRBench",
"name": "miRBench",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "miRNA, target site prediction, benchmarking",
"author": "Katarina Gresova",
"author_email": "gresova11@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b0/1c/ca17bb9a6078865d6d958902f53bb45c54d07c1b9d938218a375907e1a1f/mirbench-0.1.1.tar.gz",
"platform": null,
"description": "# miRNA target site prediction Benchmarks\n\n## Installation\n\n```bash\npip install miRBench\n```\n\n## Examples\n\n### Get all available datasets\n\n```python\nimport miRBench\n\nmiRBench.dataset.list_datasets()\n```\n\n```python\n['AGO2_CLASH_Hejret2023',\n 'AGO2_eCLIP_Klimentova2022',\n 'AGO2_eCLIP_Manakov2022']\n```\n\nNot all datasets are available with all splits and ratios. To get available splits and ratios, use the `full` option.\n\n```python\nmiRBench.dataset.list_datasets(full=True)\n```\n\n```python\n{'AGO2_CLASH_Hejret2023': {'splits': {\n 'train': {'ratios': ['10']},\n 'test': {'ratios': ['1', '10', '100']}}},\n 'AGO2_eCLIP_Klimentova2022': {'splits': {\n 'test': {'ratios': ['1', '10', '100']}}},\n 'AGO2_eCLIP_Manakov2022': {'splits': {\n 'train': {'ratios': ['1', '10', '100']},\n 'test': {'ratios': ['1', '10', '100']}}}\n}\n```\n\n### Get dataset\n\n```python\ndataset_name = \"AGO2_CLASH_Hejret2023\"\ndf = miRBench.dataset.get_dataset_df(dataset_name, split=\"test\", ratio=\"1\")\ndf.head()\n```\n\n|\t| noncodingRNA\t| gene |\tlabel |\n| -------- | ------- | ------- | ------- |\n| 0 |\tTCCGAGCCTGGGTCTCCCTCTT\t |GGGTTTAGGGAAGGAGGTTCGGAGACAGGGAGCCAAGGCCTCTGTC... |\t1 |\n|1 |\tTGCGGGGCTAGGGCTAACAGCA\t|GCTTCCCAAGTTAGGTTAGTGATGTGAAATGCTCCTGTCCCTGGCC...\t| 1 |\n| 2 |\tCCCACTGCCCCAGGTGCTGCTGG\t|TCTTTCCAAAATTGTCCAGCAGCTTGAATGAGGCAGTGACAATTCT...\t| 1 |\n| 3 |\tTGAGGGGCAGAGAGCGAGACTTT\t|CAGAACTGGGATTCAAGCGAGGTCTGGCCCCTCAGTCTGTGGCTTT...\t| 1 |\n| 4\t |CAAAGTGCTGTTCGTGCAGGTAG\t|TTTTTTCCCTTAGGACTCTGCACTTTATAGAATGTTGTAAAACAGA...\t| 1 |\n\nData will be downloaded to `$HOME / \".miRBench\" / \"datasets\"` directory, under separate subdirectories for each dataset.\n\n### Get all available tools\n\n```python\nmiRBench.predictor.list_predictors()\n```\n```python\n['CnnMirTarget_Zheng2020',\n 'RNACofold',\n 'miRNA_CNN_Hejret2023',\n 'miRBind_Klimentova2022',\n 'TargetNet_Min2021',\n 'Seed8mer',\n 'Seed7mer',\n 'Seed6mer',\n 'Seed6merBulgeOrMismatch',\n 'TargetScanCnn_McGeary2019',\n 'InteractionAwareModel_Yang2024']\n```\n\n### Encode dataset\n\n```python\ntool = 'miRBind_Klimentova2022'\nencoder = miRBench.encoder.get_encoder(tool)\n\ninput = encoder(df)\n```\n\n### Get predictions\n\n```python\npredictor = miRBench.predictor.get_predictor(tool)\n\npredictions = predictor(input)\npredictions[:10]\n```\n\n```python\narray([0.6899161 , 0.15220629, 0.07301956, 0.43757868, 0.34360734,\n 0.20519172, 0.0955029 , 0.79298246, 0.14150576, 0.05329492],\n dtype=float32)\n```\n\n## Benchmark all tools on all datasets\n\n```bash\npython benchmark_all.py OUTPUT_FOLDER_PATH\n```\n\nThe script will run all tools on all datasets and will produce a file with suffix `_predictions.tsv` for each dataset. Predictions from every tool will be saved in separate columns.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A collection of datasets and predictors for benchmarking miRNA target site prediction algorithms",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/katarinagresova/miRBench"
},
"split_keywords": [
"mirna",
" target site prediction",
" benchmarking"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b01cca17bb9a6078865d6d958902f53bb45c54d07c1b9d938218a375907e1a1f",
"md5": "bb62c2359a779817bbea0a0851d5ad13",
"sha256": "61ab615a95e365e2c1cb87d61e8e8e02f3989c8dd568d7baf512de0ed7e75156"
},
"downloads": -1,
"filename": "mirbench-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "bb62c2359a779817bbea0a0851d5ad13",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15579,
"upload_time": "2024-09-27T09:24:51",
"upload_time_iso_8601": "2024-09-27T09:24:51.191547Z",
"url": "https://files.pythonhosted.org/packages/b0/1c/ca17bb9a6078865d6d958902f53bb45c54d07c1b9d938218a375907e1a1f/mirbench-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-27 09:24:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "katarinagresova",
"github_project": "miRBench",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mirbench"
}