rnapy


Namernapy JSON
Version 3.2.2 PyPI version JSON
download
home_pagehttps://github.com/linorman/rnapy
SummaryUnified RNA Analysis Toolkit - ML-powered RNA sequence analysis and structure prediction
upload_time2025-10-14 01:07:22
maintainerNone
docs_urlNone
authorLinorman
requires_python>=3.8
licenseMIT
keywords rna bioinformatics machine-learning structure-prediction sequence-analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RNAPy — Unified RNA Analysis Toolkit

RNAPy is a unified Python toolkit that wraps several powerful RNA models with a consistent, easy-to-use API. It currently integrates:

- RNA-FM for sequence embeddings and 2D secondary structure prediction
- RhoFold for 3D structure prediction
- RiboDiffusion for inverse folding (sequence generation from structure)
- RhoDesign for inverse folding (structure-to-sequence, optional 2D guidance)
- RNA-MSM for MSA-based embeddings, attention, consensus, and conservation


## Key Features

- Consistent high-level API via `RNAToolkit`
- Extract sequence embeddings (RNA-FM, mRNA-FM)
- 2D structure prediction (RNA-FM)
- 3D structure prediction (RhoFold)
- Inverse folding (RiboDiffusion, RhoDesign)
- MSA analysis and features (RNA-MSM: embeddings, attention, consensus, conservation)


## Project Structure

```
RNAPy
├── rnapy/                    # Library source
│   ├── core/                 # Base classes, factory, config, exceptions
│   ├── providers/            # Model providers (rna_fm/mrna_fm, rhofold, RiboDiffusion, rhodesign, rna_msm)
│   ├── interfaces/           # Public interfaces
│   └── utils/                # Utilities
├── configs/                  # Global and model configs (YAML)
├── demos/                    # Ready-to-run examples
│   ├── models/               # Put pretrained weights here
│   ├── results/              # Default output location for demos
│   └── demo_*.py             # Demo scripts
├── requirements.txt
├── setup.py
└── README.md
```


## Installation

Recommended: Python 3.12+ and a recent PyTorch build compatible with your CPU/GPU.

```
pip install rnapy --extra-index-url  https://download.pytorch.org/whl/cpu 
```


## Documentation

- Toolkit usage guide: `docs/RNAToolkit_Usage_Guide.md`


## Model Weights

- You can download pretrained weights from the original repositories which will be mentioned in the Acknowledgements section.
- Or you can find weights used in RNAPy on Hugging Face:
https://huggingface.co/Linorman616/rnapy_models/
- Actually if you don't provide `model-path` when loading a model, RNAPy will try to download the weights from this repo automatically.


## Quick Start

### 1) RNA-FM (2D structure + embeddings)

```python
from rnapy import RNAToolkit

sequence = "AGAUAGUCGUGGGUUCCCUUUCUGGAGGGAGAGGGAAUUCCACGUUGACCGGGGGAACCGGCCAGGCCCGGAAGGGAGCAACCGUGCCCGGCUAUC"

# Initialize
toolkit = RNAToolkit(device="cpu")

# Load model (choose one)
toolkit.load_model("rna-fm", "./models/RNA-FM_pretrained.pth")

# 2D structure prediction
result = toolkit.predict_structure(
    sequence,
    structure_type="2d",
    model="rna-fm",
    save_dir="./results/rna_fm/demo.ct",
)

# Embeddings
embeddings = toolkit.extract_embeddings(
    sequence,
    model="rna-fm",
    save_dir="./results/rna_fm/embeddings.npy",
)

print(result.get("secondary_structure"))
print(result.get("confidence_scores"))
```

### 2) RhoFold (3D structure prediction)

```python
from rnapy import RNAToolkit

sequence = "GGAUCCCGCGCCCCUUUCUCCCCGGUGAUCCCGCGAGCCCCGGUAAGGCCGGGUCC"

toolkit = RNAToolkit(device="cpu")

# Load RhoFold
toolkit.load_model("rhofold", "./models/RhoFold_pretrained.pt")

# Predict 3D
result = toolkit.predict_structure(
    sequence,
    structure_type="3d",
    model="rhofold",
    save_dir="./results/rhofold",
    relax_steps=500,
)

pdb_file = result.get("structure_3d_refined", result.get("structure_3d_unrelaxed"))
print("3D structure:", pdb_file)
```

### 3) RiboDiffusion (inverse folding from PDB)

```python
from rnapy import RNAToolkit

structure_file = "./input/R1107.pdb"

toolkit = RNAToolkit(device="cpu")

# Load RiboDiffusion
toolkit.load_model("ribodiffusion", "./models/exp_inf.pth")

# Generate sequences from structure
result = toolkit.generate_sequences_from_structure(
    structure_file=structure_file,
    model="ribodiffusion",
    n_samples=2,
    sampling_steps=100,
    cond_scale=0.5,
    dynamic_threshold=True,
    save_dir="./results/ribodiffusion",
)

print("Generated count:", result.get("sequence_count", 0))
print("Output dir:", result.get("output_directory"))
```

### 4) RhoDesign (inverse folding with optional 2D guidance)

```python
from rnapy import RNAToolkit

pdb_path = "./input/2zh6_B.pdb"
ss_path = "./input/2zh6_B.npy"  # optional numpy file with secondary-structure/contact info

toolkit = RNAToolkit(device="cpu")

# Load RhoDesign (with-2D variant checkpoint)
toolkit.load_model("rhodesign", "./models/ss_apexp_best.pth")

# Generate one sequence from structure (RhoDesign samples one sequence per call)
res = toolkit.generate_sequences_from_structure(
    structure_file=pdb_path,
    model="rhodesign",
    secondary_structure_file=ss_path,  # omit or set None to run without 2D guidance
    save_dir="./results/rhodesign"
)

print("Predicted sequence:", res["sequences"][0])
print("Recovery rate:", res.get("quality_metrics", {}).get("sequence_recovery_rate"))
print("FASTA:", res.get("files", {}).get("fasta_files", [None])[0])
```

### 5) RNA-MSM (MSA features, consensus, conservation)

```python
from rnapy import RNAToolkit

# Initialize
toolkit = RNAToolkit(device="cpu")

# Load RNA-MSM
toolkit.load_model("rna-msm", "./models/RNA_MSM_pretrained_weights.pt")

# Prepare an example MSA (aligned sequences)
msa_sequences = [
    "AUGGCGAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCAAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCGAUUUCAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
    "AUGGCGAUUUUAUUUACCGCAGUCGUUACCAGCAUACUCGACUUUAAAUGCC",
]

# Extract embeddings (per-position, last layer by default)
features = toolkit.extract_msa_features(
    msa_sequences,
    feature_type="embeddings",
    model="rna-msm",
    save_dir="./results/rna_msm",
)

# Analyze MSA for consensus and conservation
msa_result = toolkit.analyze_msa(
    msa_sequences,
    model="rna-msm",
    extract_consensus=True,
    extract_conservation=True,
    save_dir="./results/rna_msm",
)

print("Consensus:", msa_result.get("consensus_sequence"))
print("Conservation (first 10):", (msa_result.get("conservation_scores") or [])[:10])
```


## Evaluation Metrics

RNAPy ships with common structural evaluation metrics, available via both the Python API and the CLI.

### LDDT (Local Distance Difference Test)

- Python:

```python
from rnapy.toolkit import RNAToolkit

toolkit = RNAToolkit(device="cpu")
res = toolkit.calculate_lddt(
    reference_structure="./demos/input/2zh6_B.pdb",
    predicted_structure="./demos/input/R1107.pdb",
    radius=15.0,
    distance_thresholds=(0.5, 1.0, 2.0, 4.0),
    return_column_scores=True,
)
print(res["lddt"])            # Global LDDT
print(res.get("columns", [])[:5])  # Optional: first 5 per-residue column scores
```

- CLI:

```bash
rnapy metric lddt \
  --reference ./demos/input/2zh6_B.pdb \
  --predicted ./demos/input/R1107.pdb \
  --radius 15.0 \
  --thresholds 0.5,1.0,2.0,4.0 \
  --return-column-scores
```

Example script: `demos/demo_lddt.py`

### RMSD (Root Mean Square Deviation)

- Python:

```python
from rnapy.toolkit import RNAToolkit

toolkit = RNAToolkit()
rmsd = toolkit.calculate_rmsd(
    "./demos/input/rmsd_tests/resources/ci2_1.pdb",
    "./demos/input/rmsd_tests/resources/ci2_2.pdb",
    file_format="pdb",
)
print("RMSD:", rmsd)
```

- CLI (common flags only; see `rnapy metric rmsd --help` for details):

```bash
rnapy metric rmsd \
  --file1 ./demos/input/rmsd_tests/resources/ci2_1.pdb \
  --file2 ./demos/input/rmsd_tests/resources/ci2_2.pdb \
  --file-format pdb \
  --rotation kabsch
```

Other options include: `--reorder`, `--reorder-method inertia-hungarian`, `--use-reflections`, `--only-alpha-carbons`, `--ignore-hydrogen`, `--output-aligned-structure`, `--print-only-rmsd-atoms`, `--gzip-format`, etc.

Example script: `demos/demo_rmsd.py`

### TM-score

- Python:

```python
from rnapy.toolkit import RNAToolkit

toolkit = RNAToolkit(device="cpu")
result = toolkit.calculate_tm_score(
    structure_1="./demos/input/2zh6_B.pdb",
    structure_2="./demos/input/R1107.pdb",
    mol="rna",
)
print(result["raw_output"])     # Raw TM-score tool output
print(result["tm_score_1"])     # TM-score normalized by length 1
print(result["tm_score_2"])     # TM-score normalized by length 2
```

- CLI:

```bash
rnapy metric tm-score \
  --struct1 ./demos/input/2zh6_B.pdb \
  --struct2 ./demos/input/R1107.pdb \
  --mol rna
```

Example script: `demos/demo_tm_score.py`


## Sequence Recovery & Structure F1

Sequence recovery and secondary-structure F1 are common quality metrics for design and prediction.

- Python:

```python
from rnapy import RNAToolkit

toolkit = RNAToolkit()

# Structure F1 (dot-bracket)
f1 = toolkit.calculate_structure_f1("(((...)))", "(((.....)))")
print(f1)  # {precision, recall, f1_score}

# Sequence recovery rate
recovery = toolkit.calculate_sequence_recovery("AUGCUAGCUAGC", "AUGCUAGCUUGC")
print(recovery["overall_recovery"])  # overall recovery
```

- CLI:

```bash
# Structure F1
rnapy struct f1 \
  --struct1 "(((...)))" \
  --struct2 "(((.....)))"

# Sequence recovery
rnapy seq recovery \
  --native  AUGCUAGCUAGC \
  --designed AUGCUAGCUUGC
```

Example script: `demos/demo_f1_recovery.py`


## Command Line Interface (CLI)

The package installs a console script named `rnapy` (via setup entry point). After installation, you can run `rnapy` from your shell.

- Show top-level help:
  - `rnapy --help`
- Show help for a subcommand:
  - `rnapy seq embed --help`

### Global options

These options are shared by all subcommands:

- `--device {cpu,cuda}`: Computing device (default: `cpu`)
- `--model {rna-fm,mrna-fm,rhofold,ribodiffusion,rhodesign,rna-msm}`: Model provider (required)
- `--model-path PATH`: Path to the model checkpoint (required)
- `--config-dir PATH`: Configuration directory (default: `configs`)
- `--provider-config PATH`: Optional provider-specific config file
- `--seed INT`: Random seed
- `--save-dir DIR`: Output directory
- `--verbose` or `-v`: Verbose logs and full tracebacks on errors

Input conventions:

- Use exactly one of `--seq` or `--fasta`
  - `--seq` accepts a single RNA sequence or multiple sequences separated by commas
  - `--fasta` accepts a `.fasta/.fa/.fas` file path

### Subcommands

1) Sequence embeddings

Extract embeddings from RNA-FM/mRNA-FM:

```bash
rnapy seq embed \
  --model rna-fm \
  --model-path ./models/RNA-FM_pretrained.pth \
  --seq "AGAUAGUCGUGGGU...UCGGCUAUC" \
  --layer -1 \
  --format mean \
  --save-dir ./results/rna_fm
```

- `--layer`: which layer to use (default: `-1`, i.e., last layer)
- `--format {raw,mean,bos}`: output format (default: `mean`)
- You can also pass `--fasta path/to/input.fasta` instead of `--seq`

2) Structure prediction

Predict 2D RNA-FM or 3D (RhoFold) structure:

```bash
# 2D with mRNA-FM
rnapy struct predict \
  --model rna-fm \
  --model-path ./models/RNA-FM_pretrained.pth \
  --seq "AGAUAGUCGUGGGU...UCGGCUAUC" \
  --structure-type 2d \
  --save-dir ./results/rna_fm_struct

# 3D with RhoFold (structure-type will auto-infer to 3d)
rnapy struct predict \
  --model rhofold \
  --model-path ./models/RhoFold_pretrained.pt \
  --seq "GGAUCCCGCGCCC...GCCGGGUCC" \
  --save-dir ./results/rhofold_3d
```

- If `--structure-type` is omitted: `rhofold` -> `3d`; `rna-fm`/`mrna-fm` -> `2d`

3) Inverse folding (generate sequences from structure)

RiboDiffusion and RhoDesign take a PDB as input:

```bash
# RiboDiffusion: generate multiple sequences
rnapy invfold gen \
  --model ribodiffusion \
  --model-path ./models/exp_inf.pth \
  --pdb ./input/R1107.pdb \
  --n-samples 2 \
  --save-dir ./results/ribodiffusion

# RhoDesign: optional 2D guidance via NPY
rnapy invfold gen \
  --model rhodesign \
  --model-path ./models/ss_apexp_best.pth \
  --pdb ./input/2zh6_B.pdb \
  --ss-npy ./input/2zh6_B.npy \
  --save-dir ./results/rhodesign
```

- `--pdb`: required
- `--ss-npy`: optional; only used by RhoDesign (2D guidance)
- `--n-samples`: number of sequences to sample (RhoDesign samples one per call; RiboDiffusion supports many)

4) MSA features (RNA-MSM)

Extract embeddings/attention from an aligned MSA:

```bash
rnapy msa features \
  --model rna-msm \
  --model-path ./models/RNA_MSM_pretrained_weights.pt \
  --fasta ./input/example_msa.fasta \
  --feature-type embeddings \
  --layer -1 \
  --save-dir ./results/rna_msm_features
```

- `--feature-type {embeddings,attention,both}` (default: `embeddings`)
- `--layer`: which layer to extract (default: `-1`)

5) MSA analysis (RNA-MSM)

Compute consensus and/or conservation from an MSA:

```bash
rnapy msa analyze \
  --model rna-msm \
  --model-path ./models/RNA_MSM_pretrained_weights.pt \
  --fasta ./input/example_msa.fasta \
  --extract-consensus \
  --extract-conservation \
  --save-dir ./results/rna_msm_analyze
```

- If you pass a single `--seq` (not multiple), this subcommand will error because it requires multiple sequences or a FASTA file

6) Metrics (structure evaluation)

- LDDT: see examples above, or run `rnapy metric lddt --help`
- RMSD: see examples above, or run `rnapy metric rmsd --help`
- TM-score: see examples above, or run `rnapy metric tm-score --help`

7) Sequence utilities

- Structure F1: `rnapy struct f1 --struct1 ... --struct2 ...`
- Sequence recovery: `rnapy seq recovery --native ... --designed ...`

### Outputs and logging

- When `--save-dir` is provided, results are written under that directory. The exact filenames depend on the provider/task (e.g., `.npy` for embeddings, `.ct` for 2D, `.pdb`/folder for 3D, `.json` for analysis summaries). The CLI prints a brief summary and (when applicable) a path hint.
- Exit codes: `0` on success; non-zero on errors. Add `-v/--verbose` for full tracebacks.

### Common pitfalls

- Do not pass both `--seq` and `--fasta` at the same time.
- Ensure the `--model-path` points to the correct checkpoint for the chosen `--model`.
- `rhofold` defaults to 3D; RNA-FM/mRNA-FM default to 2D if `--structure-type` is omitted.
- `msa analyze` requires multiple sequences (comma-separated via `--seq`) or a FASTA file.


## Run the Demos

From the repository root:

```powershell
# mRNA-FM / RNA-FM demo
cd .\demos
python .\demo_rna_fm.py

# RhoFold demo
python .\demo_rhofold.py

# RiboDiffusion demo
python .\demo_ribodiffusion.py

# RhoDesign demo
python .\demo_rhodesign.py

# RNA-MSM demo
python .\demo_rna_msm.py

# LDDT demo
python .\demo_lddt.py

# RMSD demo
python .\demo_rmsd.py

# TM-score demo
python .\demo_tm_score.py

# Sequence recovery & Structure F1 demo
python .\demo_f1_recovery.py
```

Additional examples may be available: `rna_fm_demo.py`, `rhofold_demo.py`, `ribodiffusion_demo.py`.

## Datasets

You can download example datasets via API or CLI (e.g., Rfam, RNA Puzzles, CASP15, etc.).

- Available dataset names: `Rfam`, `Rfam_original`, `RNA_Puzzles`, `CASP15`, `RNAsolo2`

- CLI:

```bash
# List available datasets
rnapy dataset list

# Download Rfam (from the HF mirror) with parallel workers
rnapy dataset download --dataset Rfam --max-workers 8
```

- Python:

```python
from rnapy.toolkit import RNAToolkit

toolkit = RNAToolkit()
print(toolkit.list_available_datasets())
toolkit.download_dataset("Rfam", max_workers=8)
```

## Configuration

YAML configs are provided under `./configs/` and `./demos/configs/`. You can:

- Pass `config_dir` to `RNAToolkit` to use custom defaults
- Override per-call parameters in `load_model(...)` and task methods

## License

MIT License


## Acknowledgements

- RNA-FM: https://github.com/ml4bio/RNA-FM
- RhoFold: https://github.com/ml4bio/RhoFold
- RiboDiffusion: https://github.com/ml4bio/RiboDiffusion
- RhoDesign: https://github.com/ml4bio/RhoDesign
- RNA-MSM: https://github.com/yikunpku/RNA-MSM

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/linorman/rnapy",
    "name": "rnapy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "RNA, bioinformatics, machine-learning, structure-prediction, sequence-analysis",
    "author": "Linorman",
    "author_email": "Linorman <zyh52616@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/64/f8/3d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700/rnapy-3.2.2.tar.gz",
    "platform": null,
    "description": "# RNAPy \u2014 Unified RNA Analysis Toolkit\r\n\r\nRNAPy is a unified Python toolkit that wraps several powerful RNA models with a consistent, easy-to-use API. It currently integrates:\r\n\r\n- RNA-FM for sequence embeddings and 2D secondary structure prediction\r\n- RhoFold for 3D structure prediction\r\n- RiboDiffusion for inverse folding (sequence generation from structure)\r\n- RhoDesign for inverse folding (structure-to-sequence, optional 2D guidance)\r\n- RNA-MSM for MSA-based embeddings, attention, consensus, and conservation\r\n\r\n\r\n## Key Features\r\n\r\n- Consistent high-level API via `RNAToolkit`\r\n- Extract sequence embeddings (RNA-FM, mRNA-FM)\r\n- 2D structure prediction (RNA-FM)\r\n- 3D structure prediction (RhoFold)\r\n- Inverse folding (RiboDiffusion, RhoDesign)\r\n- MSA analysis and features (RNA-MSM: embeddings, attention, consensus, conservation)\r\n\r\n\r\n## Project Structure\r\n\r\n```\r\nRNAPy\r\n\u251c\u2500\u2500 rnapy/                    # Library source\r\n\u2502   \u251c\u2500\u2500 core/                 # Base classes, factory, config, exceptions\r\n\u2502   \u251c\u2500\u2500 providers/            # Model providers (rna_fm/mrna_fm, rhofold, RiboDiffusion, rhodesign, rna_msm)\r\n\u2502   \u251c\u2500\u2500 interfaces/           # Public interfaces\r\n\u2502   \u2514\u2500\u2500 utils/                # Utilities\r\n\u251c\u2500\u2500 configs/                  # Global and model configs (YAML)\r\n\u251c\u2500\u2500 demos/                    # Ready-to-run examples\r\n\u2502   \u251c\u2500\u2500 models/               # Put pretrained weights here\r\n\u2502   \u251c\u2500\u2500 results/              # Default output location for demos\r\n\u2502   \u2514\u2500\u2500 demo_*.py             # Demo scripts\r\n\u251c\u2500\u2500 requirements.txt\r\n\u251c\u2500\u2500 setup.py\r\n\u2514\u2500\u2500 README.md\r\n```\r\n\r\n\r\n## Installation\r\n\r\nRecommended: Python 3.12+ and a recent PyTorch build compatible with your CPU/GPU.\r\n\r\n```\r\npip install rnapy --extra-index-url  https://download.pytorch.org/whl/cpu \r\n```\r\n\r\n\r\n## Documentation\r\n\r\n- Toolkit usage guide: `docs/RNAToolkit_Usage_Guide.md`\r\n\r\n\r\n## Model Weights\r\n\r\n- You can download pretrained weights from the original repositories which will be mentioned in the Acknowledgements section.\r\n- Or you can find weights used in RNAPy on Hugging Face:\r\nhttps://huggingface.co/Linorman616/rnapy_models/\r\n- Actually if you don't provide `model-path` when loading a model, RNAPy will try to download the weights from this repo automatically.\r\n\r\n\r\n## Quick Start\r\n\r\n### 1) RNA-FM (2D structure + embeddings)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nsequence = \"AGAUAGUCGUGGGUUCCCUUUCUGGAGGGAGAGGGAAUUCCACGUUGACCGGGGGAACCGGCCAGGCCCGGAAGGGAGCAACCGUGCCCGGCUAUC\"\r\n\r\n# Initialize\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load model (choose one)\r\ntoolkit.load_model(\"rna-fm\", \"./models/RNA-FM_pretrained.pth\")\r\n\r\n# 2D structure prediction\r\nresult = toolkit.predict_structure(\r\n    sequence,\r\n    structure_type=\"2d\",\r\n    model=\"rna-fm\",\r\n    save_dir=\"./results/rna_fm/demo.ct\",\r\n)\r\n\r\n# Embeddings\r\nembeddings = toolkit.extract_embeddings(\r\n    sequence,\r\n    model=\"rna-fm\",\r\n    save_dir=\"./results/rna_fm/embeddings.npy\",\r\n)\r\n\r\nprint(result.get(\"secondary_structure\"))\r\nprint(result.get(\"confidence_scores\"))\r\n```\r\n\r\n### 2) RhoFold (3D structure prediction)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nsequence = \"GGAUCCCGCGCCCCUUUCUCCCCGGUGAUCCCGCGAGCCCCGGUAAGGCCGGGUCC\"\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RhoFold\r\ntoolkit.load_model(\"rhofold\", \"./models/RhoFold_pretrained.pt\")\r\n\r\n# Predict 3D\r\nresult = toolkit.predict_structure(\r\n    sequence,\r\n    structure_type=\"3d\",\r\n    model=\"rhofold\",\r\n    save_dir=\"./results/rhofold\",\r\n    relax_steps=500,\r\n)\r\n\r\npdb_file = result.get(\"structure_3d_refined\", result.get(\"structure_3d_unrelaxed\"))\r\nprint(\"3D structure:\", pdb_file)\r\n```\r\n\r\n### 3) RiboDiffusion (inverse folding from PDB)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nstructure_file = \"./input/R1107.pdb\"\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RiboDiffusion\r\ntoolkit.load_model(\"ribodiffusion\", \"./models/exp_inf.pth\")\r\n\r\n# Generate sequences from structure\r\nresult = toolkit.generate_sequences_from_structure(\r\n    structure_file=structure_file,\r\n    model=\"ribodiffusion\",\r\n    n_samples=2,\r\n    sampling_steps=100,\r\n    cond_scale=0.5,\r\n    dynamic_threshold=True,\r\n    save_dir=\"./results/ribodiffusion\",\r\n)\r\n\r\nprint(\"Generated count:\", result.get(\"sequence_count\", 0))\r\nprint(\"Output dir:\", result.get(\"output_directory\"))\r\n```\r\n\r\n### 4) RhoDesign (inverse folding with optional 2D guidance)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\npdb_path = \"./input/2zh6_B.pdb\"\r\nss_path = \"./input/2zh6_B.npy\"  # optional numpy file with secondary-structure/contact info\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RhoDesign (with-2D variant checkpoint)\r\ntoolkit.load_model(\"rhodesign\", \"./models/ss_apexp_best.pth\")\r\n\r\n# Generate one sequence from structure (RhoDesign samples one sequence per call)\r\nres = toolkit.generate_sequences_from_structure(\r\n    structure_file=pdb_path,\r\n    model=\"rhodesign\",\r\n    secondary_structure_file=ss_path,  # omit or set None to run without 2D guidance\r\n    save_dir=\"./results/rhodesign\"\r\n)\r\n\r\nprint(\"Predicted sequence:\", res[\"sequences\"][0])\r\nprint(\"Recovery rate:\", res.get(\"quality_metrics\", {}).get(\"sequence_recovery_rate\"))\r\nprint(\"FASTA:\", res.get(\"files\", {}).get(\"fasta_files\", [None])[0])\r\n```\r\n\r\n### 5) RNA-MSM (MSA features, consensus, conservation)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\n# Initialize\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RNA-MSM\r\ntoolkit.load_model(\"rna-msm\", \"./models/RNA_MSM_pretrained_weights.pt\")\r\n\r\n# Prepare an example MSA (aligned sequences)\r\nmsa_sequences = [\r\n    \"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n    \"AUGGCAAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n    \"AUGGCGAUUUCAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n    \"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAGCAUACUCGACUUUAAAUGCC\",\r\n]\r\n\r\n# Extract embeddings (per-position, last layer by default)\r\nfeatures = toolkit.extract_msa_features(\r\n    msa_sequences,\r\n    feature_type=\"embeddings\",\r\n    model=\"rna-msm\",\r\n    save_dir=\"./results/rna_msm\",\r\n)\r\n\r\n# Analyze MSA for consensus and conservation\r\nmsa_result = toolkit.analyze_msa(\r\n    msa_sequences,\r\n    model=\"rna-msm\",\r\n    extract_consensus=True,\r\n    extract_conservation=True,\r\n    save_dir=\"./results/rna_msm\",\r\n)\r\n\r\nprint(\"Consensus:\", msa_result.get(\"consensus_sequence\"))\r\nprint(\"Conservation (first 10):\", (msa_result.get(\"conservation_scores\") or [])[:10])\r\n```\r\n\r\n\r\n## Evaluation Metrics\r\n\r\nRNAPy ships with common structural evaluation metrics, available via both the Python API and the CLI.\r\n\r\n### LDDT (Local Distance Difference Test)\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\nres = toolkit.calculate_lddt(\r\n    reference_structure=\"./demos/input/2zh6_B.pdb\",\r\n    predicted_structure=\"./demos/input/R1107.pdb\",\r\n    radius=15.0,\r\n    distance_thresholds=(0.5, 1.0, 2.0, 4.0),\r\n    return_column_scores=True,\r\n)\r\nprint(res[\"lddt\"])            # Global LDDT\r\nprint(res.get(\"columns\", [])[:5])  # Optional: first 5 per-residue column scores\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\nrnapy metric lddt \\\r\n  --reference ./demos/input/2zh6_B.pdb \\\r\n  --predicted ./demos/input/R1107.pdb \\\r\n  --radius 15.0 \\\r\n  --thresholds 0.5,1.0,2.0,4.0 \\\r\n  --return-column-scores\r\n```\r\n\r\nExample script: `demos/demo_lddt.py`\r\n\r\n### RMSD (Root Mean Square Deviation)\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\nrmsd = toolkit.calculate_rmsd(\r\n    \"./demos/input/rmsd_tests/resources/ci2_1.pdb\",\r\n    \"./demos/input/rmsd_tests/resources/ci2_2.pdb\",\r\n    file_format=\"pdb\",\r\n)\r\nprint(\"RMSD:\", rmsd)\r\n```\r\n\r\n- CLI (common flags only; see `rnapy metric rmsd --help` for details):\r\n\r\n```bash\r\nrnapy metric rmsd \\\r\n  --file1 ./demos/input/rmsd_tests/resources/ci2_1.pdb \\\r\n  --file2 ./demos/input/rmsd_tests/resources/ci2_2.pdb \\\r\n  --file-format pdb \\\r\n  --rotation kabsch\r\n```\r\n\r\nOther options include: `--reorder`, `--reorder-method inertia-hungarian`, `--use-reflections`, `--only-alpha-carbons`, `--ignore-hydrogen`, `--output-aligned-structure`, `--print-only-rmsd-atoms`, `--gzip-format`, etc.\r\n\r\nExample script: `demos/demo_rmsd.py`\r\n\r\n### TM-score\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\nresult = toolkit.calculate_tm_score(\r\n    structure_1=\"./demos/input/2zh6_B.pdb\",\r\n    structure_2=\"./demos/input/R1107.pdb\",\r\n    mol=\"rna\",\r\n)\r\nprint(result[\"raw_output\"])     # Raw TM-score tool output\r\nprint(result[\"tm_score_1\"])     # TM-score normalized by length 1\r\nprint(result[\"tm_score_2\"])     # TM-score normalized by length 2\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\nrnapy metric tm-score \\\r\n  --struct1 ./demos/input/2zh6_B.pdb \\\r\n  --struct2 ./demos/input/R1107.pdb \\\r\n  --mol rna\r\n```\r\n\r\nExample script: `demos/demo_tm_score.py`\r\n\r\n\r\n## Sequence Recovery & Structure F1\r\n\r\nSequence recovery and secondary-structure F1 are common quality metrics for design and prediction.\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\n\r\n# Structure F1 (dot-bracket)\r\nf1 = toolkit.calculate_structure_f1(\"(((...)))\", \"(((.....)))\")\r\nprint(f1)  # {precision, recall, f1_score}\r\n\r\n# Sequence recovery rate\r\nrecovery = toolkit.calculate_sequence_recovery(\"AUGCUAGCUAGC\", \"AUGCUAGCUUGC\")\r\nprint(recovery[\"overall_recovery\"])  # overall recovery\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\n# Structure F1\r\nrnapy struct f1 \\\r\n  --struct1 \"(((...)))\" \\\r\n  --struct2 \"(((.....)))\"\r\n\r\n# Sequence recovery\r\nrnapy seq recovery \\\r\n  --native  AUGCUAGCUAGC \\\r\n  --designed AUGCUAGCUUGC\r\n```\r\n\r\nExample script: `demos/demo_f1_recovery.py`\r\n\r\n\r\n## Command Line Interface (CLI)\r\n\r\nThe package installs a console script named `rnapy` (via setup entry point). After installation, you can run `rnapy` from your shell.\r\n\r\n- Show top-level help:\r\n  - `rnapy --help`\r\n- Show help for a subcommand:\r\n  - `rnapy seq embed --help`\r\n\r\n### Global options\r\n\r\nThese options are shared by all subcommands:\r\n\r\n- `--device {cpu,cuda}`: Computing device (default: `cpu`)\r\n- `--model {rna-fm,mrna-fm,rhofold,ribodiffusion,rhodesign,rna-msm}`: Model provider (required)\r\n- `--model-path PATH`: Path to the model checkpoint (required)\r\n- `--config-dir PATH`: Configuration directory (default: `configs`)\r\n- `--provider-config PATH`: Optional provider-specific config file\r\n- `--seed INT`: Random seed\r\n- `--save-dir DIR`: Output directory\r\n- `--verbose` or `-v`: Verbose logs and full tracebacks on errors\r\n\r\nInput conventions:\r\n\r\n- Use exactly one of `--seq` or `--fasta`\r\n  - `--seq` accepts a single RNA sequence or multiple sequences separated by commas\r\n  - `--fasta` accepts a `.fasta/.fa/.fas` file path\r\n\r\n### Subcommands\r\n\r\n1) Sequence embeddings\r\n\r\nExtract embeddings from RNA-FM/mRNA-FM:\r\n\r\n```bash\r\nrnapy seq embed \\\r\n  --model rna-fm \\\r\n  --model-path ./models/RNA-FM_pretrained.pth \\\r\n  --seq \"AGAUAGUCGUGGGU...UCGGCUAUC\" \\\r\n  --layer -1 \\\r\n  --format mean \\\r\n  --save-dir ./results/rna_fm\r\n```\r\n\r\n- `--layer`: which layer to use (default: `-1`, i.e., last layer)\r\n- `--format {raw,mean,bos}`: output format (default: `mean`)\r\n- You can also pass `--fasta path/to/input.fasta` instead of `--seq`\r\n\r\n2) Structure prediction\r\n\r\nPredict 2D RNA-FM or 3D (RhoFold) structure:\r\n\r\n```bash\r\n# 2D with mRNA-FM\r\nrnapy struct predict \\\r\n  --model rna-fm \\\r\n  --model-path ./models/RNA-FM_pretrained.pth \\\r\n  --seq \"AGAUAGUCGUGGGU...UCGGCUAUC\" \\\r\n  --structure-type 2d \\\r\n  --save-dir ./results/rna_fm_struct\r\n\r\n# 3D with RhoFold (structure-type will auto-infer to 3d)\r\nrnapy struct predict \\\r\n  --model rhofold \\\r\n  --model-path ./models/RhoFold_pretrained.pt \\\r\n  --seq \"GGAUCCCGCGCCC...GCCGGGUCC\" \\\r\n  --save-dir ./results/rhofold_3d\r\n```\r\n\r\n- If `--structure-type` is omitted: `rhofold` -> `3d`; `rna-fm`/`mrna-fm` -> `2d`\r\n\r\n3) Inverse folding (generate sequences from structure)\r\n\r\nRiboDiffusion and RhoDesign take a PDB as input:\r\n\r\n```bash\r\n# RiboDiffusion: generate multiple sequences\r\nrnapy invfold gen \\\r\n  --model ribodiffusion \\\r\n  --model-path ./models/exp_inf.pth \\\r\n  --pdb ./input/R1107.pdb \\\r\n  --n-samples 2 \\\r\n  --save-dir ./results/ribodiffusion\r\n\r\n# RhoDesign: optional 2D guidance via NPY\r\nrnapy invfold gen \\\r\n  --model rhodesign \\\r\n  --model-path ./models/ss_apexp_best.pth \\\r\n  --pdb ./input/2zh6_B.pdb \\\r\n  --ss-npy ./input/2zh6_B.npy \\\r\n  --save-dir ./results/rhodesign\r\n```\r\n\r\n- `--pdb`: required\r\n- `--ss-npy`: optional; only used by RhoDesign (2D guidance)\r\n- `--n-samples`: number of sequences to sample (RhoDesign samples one per call; RiboDiffusion supports many)\r\n\r\n4) MSA features (RNA-MSM)\r\n\r\nExtract embeddings/attention from an aligned MSA:\r\n\r\n```bash\r\nrnapy msa features \\\r\n  --model rna-msm \\\r\n  --model-path ./models/RNA_MSM_pretrained_weights.pt \\\r\n  --fasta ./input/example_msa.fasta \\\r\n  --feature-type embeddings \\\r\n  --layer -1 \\\r\n  --save-dir ./results/rna_msm_features\r\n```\r\n\r\n- `--feature-type {embeddings,attention,both}` (default: `embeddings`)\r\n- `--layer`: which layer to extract (default: `-1`)\r\n\r\n5) MSA analysis (RNA-MSM)\r\n\r\nCompute consensus and/or conservation from an MSA:\r\n\r\n```bash\r\nrnapy msa analyze \\\r\n  --model rna-msm \\\r\n  --model-path ./models/RNA_MSM_pretrained_weights.pt \\\r\n  --fasta ./input/example_msa.fasta \\\r\n  --extract-consensus \\\r\n  --extract-conservation \\\r\n  --save-dir ./results/rna_msm_analyze\r\n```\r\n\r\n- If you pass a single `--seq` (not multiple), this subcommand will error because it requires multiple sequences or a FASTA file\r\n\r\n6) Metrics (structure evaluation)\r\n\r\n- LDDT: see examples above, or run `rnapy metric lddt --help`\r\n- RMSD: see examples above, or run `rnapy metric rmsd --help`\r\n- TM-score: see examples above, or run `rnapy metric tm-score --help`\r\n\r\n7) Sequence utilities\r\n\r\n- Structure F1: `rnapy struct f1 --struct1 ... --struct2 ...`\r\n- Sequence recovery: `rnapy seq recovery --native ... --designed ...`\r\n\r\n### Outputs and logging\r\n\r\n- When `--save-dir` is provided, results are written under that directory. The exact filenames depend on the provider/task (e.g., `.npy` for embeddings, `.ct` for 2D, `.pdb`/folder for 3D, `.json` for analysis summaries). The CLI prints a brief summary and (when applicable) a path hint.\r\n- Exit codes: `0` on success; non-zero on errors. Add `-v/--verbose` for full tracebacks.\r\n\r\n### Common pitfalls\r\n\r\n- Do not pass both `--seq` and `--fasta` at the same time.\r\n- Ensure the `--model-path` points to the correct checkpoint for the chosen `--model`.\r\n- `rhofold` defaults to 3D; RNA-FM/mRNA-FM default to 2D if `--structure-type` is omitted.\r\n- `msa analyze` requires multiple sequences (comma-separated via `--seq`) or a FASTA file.\r\n\r\n\r\n## Run the Demos\r\n\r\nFrom the repository root:\r\n\r\n```powershell\r\n# mRNA-FM / RNA-FM demo\r\ncd .\\demos\r\npython .\\demo_rna_fm.py\r\n\r\n# RhoFold demo\r\npython .\\demo_rhofold.py\r\n\r\n# RiboDiffusion demo\r\npython .\\demo_ribodiffusion.py\r\n\r\n# RhoDesign demo\r\npython .\\demo_rhodesign.py\r\n\r\n# RNA-MSM demo\r\npython .\\demo_rna_msm.py\r\n\r\n# LDDT demo\r\npython .\\demo_lddt.py\r\n\r\n# RMSD demo\r\npython .\\demo_rmsd.py\r\n\r\n# TM-score demo\r\npython .\\demo_tm_score.py\r\n\r\n# Sequence recovery & Structure F1 demo\r\npython .\\demo_f1_recovery.py\r\n```\r\n\r\nAdditional examples may be available: `rna_fm_demo.py`, `rhofold_demo.py`, `ribodiffusion_demo.py`.\r\n\r\n## Datasets\r\n\r\nYou can download example datasets via API or CLI (e.g., Rfam, RNA Puzzles, CASP15, etc.).\r\n\r\n- Available dataset names: `Rfam`, `Rfam_original`, `RNA_Puzzles`, `CASP15`, `RNAsolo2`\r\n\r\n- CLI:\r\n\r\n```bash\r\n# List available datasets\r\nrnapy dataset list\r\n\r\n# Download Rfam (from the HF mirror) with parallel workers\r\nrnapy dataset download --dataset Rfam --max-workers 8\r\n```\r\n\r\n- Python\uff1a\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\nprint(toolkit.list_available_datasets())\r\ntoolkit.download_dataset(\"Rfam\", max_workers=8)\r\n```\r\n\r\n## Configuration\r\n\r\nYAML configs are provided under `./configs/` and `./demos/configs/`. You can:\r\n\r\n- Pass `config_dir` to `RNAToolkit` to use custom defaults\r\n- Override per-call parameters in `load_model(...)` and task methods\r\n\r\n## License\r\n\r\nMIT License\r\n\r\n\r\n## Acknowledgements\r\n\r\n- RNA-FM: https://github.com/ml4bio/RNA-FM\r\n- RhoFold: https://github.com/ml4bio/RhoFold\r\n- RiboDiffusion: https://github.com/ml4bio/RiboDiffusion\r\n- RhoDesign: https://github.com/ml4bio/RhoDesign\r\n- RNA-MSM: https://github.com/yikunpku/RNA-MSM\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Unified RNA Analysis Toolkit - ML-powered RNA sequence analysis and structure prediction",
    "version": "3.2.2",
    "project_urls": {
        "Bug Reports": "https://github.com/linorman/rnapy/issues",
        "Documentation": "https://github.com/linorman/rnapy/blob/main/README.md",
        "Homepage": "https://github.com/linorman/rnapy",
        "Repository": "https://github.com/linorman/rnapy"
    },
    "split_keywords": [
        "rna",
        " bioinformatics",
        " machine-learning",
        " structure-prediction",
        " sequence-analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8709b8681423b50d45b68f7187bd6f74ddedd0f012ac8e5389d295cdb2b42e3f",
                "md5": "d73d38d03601ea66aa4929cc15133b19",
                "sha256": "99c3e1e61865b1e2073b65d96296cbe8917a5f2584d7793e3893beec3e14086d"
            },
            "downloads": -1,
            "filename": "rnapy-3.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d73d38d03601ea66aa4929cc15133b19",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18161121,
            "upload_time": "2025-10-14T01:07:13",
            "upload_time_iso_8601": "2025-10-14T01:07:13.660900Z",
            "url": "https://files.pythonhosted.org/packages/87/09/b8681423b50d45b68f7187bd6f74ddedd0f012ac8e5389d295cdb2b42e3f/rnapy-3.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "64f83d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700",
                "md5": "8f5c81af581ab5cfbc0777c254d15a59",
                "sha256": "a158ed9c1637fdade3f043864c40c1e39aa4d2dcbc5e46203ecf33d721a2055a"
            },
            "downloads": -1,
            "filename": "rnapy-3.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8f5c81af581ab5cfbc0777c254d15a59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18169748,
            "upload_time": "2025-10-14T01:07:22",
            "upload_time_iso_8601": "2025-10-14T01:07:22.558597Z",
            "url": "https://files.pythonhosted.org/packages/64/f8/3d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700/rnapy-3.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-14 01:07:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "linorman",
    "github_project": "rnapy",
    "github_not_found": true,
    "lcname": "rnapy"
}
        
Elapsed time: 3.21018s