# RNAPy — Unified RNA Analysis Toolkit
RNAPy is a unified Python toolkit that wraps several powerful RNA models with a consistent, easy-to-use API. It currently integrates:
- RNA-FM for sequence embeddings and 2D secondary structure prediction
- RhoFold for 3D structure prediction
- RiboDiffusion for inverse folding (sequence generation from structure)
- RhoDesign for inverse folding (structure-to-sequence, optional 2D guidance)
- RNA-MSM for MSA-based embeddings, attention, consensus, and conservation
## Key Features
- Consistent high-level API via `RNAToolkit`
- Extract sequence embeddings (RNA-FM, mRNA-FM)
- 2D structure prediction (RNA-FM)
- 3D structure prediction (RhoFold)
- Inverse folding (RiboDiffusion, RhoDesign)
- MSA analysis and features (RNA-MSM: embeddings, attention, consensus, conservation)
## Project Structure
```
RNAPy
├── rnapy/ # Library source
│ ├── core/ # Base classes, factory, config, exceptions
│ ├── providers/ # Model providers (rna_fm/mrna_fm, rhofold, RiboDiffusion, rhodesign, rna_msm)
│ ├── interfaces/ # Public interfaces
│ └── utils/ # Utilities
├── configs/ # Global and model configs (YAML)
├── demos/ # Ready-to-run examples
│ ├── models/ # Put pretrained weights here
│ ├── results/ # Default output location for demos
│ └── demo_*.py # Demo scripts
├── requirements.txt
├── setup.py
└── README.md
```
## Installation
Recommended: Python 3.12+ and a recent PyTorch build compatible with your CPU/GPU.
```
pip install rnapy --extra-index-url https://download.pytorch.org/whl/cpu
```
## Documentation
- Toolkit usage guide: `docs/RNAToolkit_Usage_Guide.md`
## Model Weights
- You can download pretrained weights from the original repositories which will be mentioned in the Acknowledgements section.
- Or you can find weights used in RNAPy on Hugging Face:
https://huggingface.co/Linorman616/rnapy_models/
- Actually if you don't provide `model-path` when loading a model, RNAPy will try to download the weights from this repo automatically.
## Quick Start
### 1) RNA-FM (2D structure + embeddings)
```python
from rnapy import RNAToolkit
sequence = "AGAUAGUCGUGGGUUCCCUUUCUGGAGGGAGAGGGAAUUCCACGUUGACCGGGGGAACCGGCCAGGCCCGGAAGGGAGCAACCGUGCCCGGCUAUC"
# Initialize
toolkit = RNAToolkit(device="cpu")
# Load model (choose one)
toolkit.load_model("rna-fm", "./models/RNA-FM_pretrained.pth")
# 2D structure prediction
result = toolkit.predict_structure(
sequence,
structure_type="2d",
model="rna-fm",
save_dir="./results/rna_fm/demo.ct",
)
# Embeddings
embeddings = toolkit.extract_embeddings(
sequence,
model="rna-fm",
save_dir="./results/rna_fm/embeddings.npy",
)
print(result.get("secondary_structure"))
print(result.get("confidence_scores"))
```
### 2) RhoFold (3D structure prediction)
```python
from rnapy import RNAToolkit
sequence = "GGAUCCCGCGCCCCUUUCUCCCCGGUGAUCCCGCGAGCCCCGGUAAGGCCGGGUCC"
toolkit = RNAToolkit(device="cpu")
# Load RhoFold
toolkit.load_model("rhofold", "./models/RhoFold_pretrained.pt")
# Predict 3D
result = toolkit.predict_structure(
sequence,
structure_type="3d",
model="rhofold",
save_dir="./results/rhofold",
relax_steps=500,
)
pdb_file = result.get("structure_3d_refined", result.get("structure_3d_unrelaxed"))
print("3D structure:", pdb_file)
```
### 3) RiboDiffusion (inverse folding from PDB)
```python
from rnapy import RNAToolkit
structure_file = "./input/R1107.pdb"
toolkit = RNAToolkit(device="cpu")
# Load RiboDiffusion
toolkit.load_model("ribodiffusion", "./models/exp_inf.pth")
# Generate sequences from structure
result = toolkit.generate_sequences_from_structure(
structure_file=structure_file,
model="ribodiffusion",
n_samples=2,
sampling_steps=100,
cond_scale=0.5,
dynamic_threshold=True,
save_dir="./results/ribodiffusion",
)
print("Generated count:", result.get("sequence_count", 0))
print("Output dir:", result.get("output_directory"))
```
### 4) RhoDesign (inverse folding with optional 2D guidance)
```python
from rnapy import RNAToolkit
pdb_path = "./input/2zh6_B.pdb"
ss_path = "./input/2zh6_B.npy" # optional numpy file with secondary-structure/contact info
toolkit = RNAToolkit(device="cpu")
# Load RhoDesign (with-2D variant checkpoint)
toolkit.load_model("rhodesign", "./models/ss_apexp_best.pth")
# Generate one sequence from structure (RhoDesign samples one sequence per call)
res = toolkit.generate_sequences_from_structure(
structure_file=pdb_path,
model="rhodesign",
secondary_structure_file=ss_path, # omit or set None to run without 2D guidance
save_dir="./results/rhodesign"
)
print("Predicted sequence:", res["sequences"][0])
print("Recovery rate:", res.get("quality_metrics", {}).get("sequence_recovery_rate"))
print("FASTA:", res.get("files", {}).get("fasta_files", [None])[0])
```
### 5) RNA-MSM (MSA features, consensus, conservation)
```python
from rnapy import RNAToolkit
# Initialize
toolkit = RNAToolkit(device="cpu")
# Load RNA-MSM
toolkit.load_model("rna-msm", "./models/RNA_MSM_pretrained_weights.pt")
# Prepare an example MSA (aligned sequences)
msa_sequences = [
"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
"AUGGCAAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
"AUGGCGAUUUCAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC",
"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAGCAUACUCGACUUUAAAUGCC",
]
# Extract embeddings (per-position, last layer by default)
features = toolkit.extract_msa_features(
msa_sequences,
feature_type="embeddings",
model="rna-msm",
save_dir="./results/rna_msm",
)
# Analyze MSA for consensus and conservation
msa_result = toolkit.analyze_msa(
msa_sequences,
model="rna-msm",
extract_consensus=True,
extract_conservation=True,
save_dir="./results/rna_msm",
)
print("Consensus:", msa_result.get("consensus_sequence"))
print("Conservation (first 10):", (msa_result.get("conservation_scores") or [])[:10])
```
## Evaluation Metrics
RNAPy ships with common structural evaluation metrics, available via both the Python API and the CLI.
### LDDT (Local Distance Difference Test)
- Python:
```python
from rnapy.toolkit import RNAToolkit
toolkit = RNAToolkit(device="cpu")
res = toolkit.calculate_lddt(
reference_structure="./demos/input/2zh6_B.pdb",
predicted_structure="./demos/input/R1107.pdb",
radius=15.0,
distance_thresholds=(0.5, 1.0, 2.0, 4.0),
return_column_scores=True,
)
print(res["lddt"]) # Global LDDT
print(res.get("columns", [])[:5]) # Optional: first 5 per-residue column scores
```
- CLI:
```bash
rnapy metric lddt \
--reference ./demos/input/2zh6_B.pdb \
--predicted ./demos/input/R1107.pdb \
--radius 15.0 \
--thresholds 0.5,1.0,2.0,4.0 \
--return-column-scores
```
Example script: `demos/demo_lddt.py`
### RMSD (Root Mean Square Deviation)
- Python:
```python
from rnapy.toolkit import RNAToolkit
toolkit = RNAToolkit()
rmsd = toolkit.calculate_rmsd(
"./demos/input/rmsd_tests/resources/ci2_1.pdb",
"./demos/input/rmsd_tests/resources/ci2_2.pdb",
file_format="pdb",
)
print("RMSD:", rmsd)
```
- CLI (common flags only; see `rnapy metric rmsd --help` for details):
```bash
rnapy metric rmsd \
--file1 ./demos/input/rmsd_tests/resources/ci2_1.pdb \
--file2 ./demos/input/rmsd_tests/resources/ci2_2.pdb \
--file-format pdb \
--rotation kabsch
```
Other options include: `--reorder`, `--reorder-method inertia-hungarian`, `--use-reflections`, `--only-alpha-carbons`, `--ignore-hydrogen`, `--output-aligned-structure`, `--print-only-rmsd-atoms`, `--gzip-format`, etc.
Example script: `demos/demo_rmsd.py`
### TM-score
- Python:
```python
from rnapy.toolkit import RNAToolkit
toolkit = RNAToolkit(device="cpu")
result = toolkit.calculate_tm_score(
structure_1="./demos/input/2zh6_B.pdb",
structure_2="./demos/input/R1107.pdb",
mol="rna",
)
print(result["raw_output"]) # Raw TM-score tool output
print(result["tm_score_1"]) # TM-score normalized by length 1
print(result["tm_score_2"]) # TM-score normalized by length 2
```
- CLI:
```bash
rnapy metric tm-score \
--struct1 ./demos/input/2zh6_B.pdb \
--struct2 ./demos/input/R1107.pdb \
--mol rna
```
Example script: `demos/demo_tm_score.py`
## Sequence Recovery & Structure F1
Sequence recovery and secondary-structure F1 are common quality metrics for design and prediction.
- Python:
```python
from rnapy import RNAToolkit
toolkit = RNAToolkit()
# Structure F1 (dot-bracket)
f1 = toolkit.calculate_structure_f1("(((...)))", "(((.....)))")
print(f1) # {precision, recall, f1_score}
# Sequence recovery rate
recovery = toolkit.calculate_sequence_recovery("AUGCUAGCUAGC", "AUGCUAGCUUGC")
print(recovery["overall_recovery"]) # overall recovery
```
- CLI:
```bash
# Structure F1
rnapy struct f1 \
--struct1 "(((...)))" \
--struct2 "(((.....)))"
# Sequence recovery
rnapy seq recovery \
--native AUGCUAGCUAGC \
--designed AUGCUAGCUUGC
```
Example script: `demos/demo_f1_recovery.py`
## Command Line Interface (CLI)
The package installs a console script named `rnapy` (via setup entry point). After installation, you can run `rnapy` from your shell.
- Show top-level help:
- `rnapy --help`
- Show help for a subcommand:
- `rnapy seq embed --help`
### Global options
These options are shared by all subcommands:
- `--device {cpu,cuda}`: Computing device (default: `cpu`)
- `--model {rna-fm,mrna-fm,rhofold,ribodiffusion,rhodesign,rna-msm}`: Model provider (required)
- `--model-path PATH`: Path to the model checkpoint (required)
- `--config-dir PATH`: Configuration directory (default: `configs`)
- `--provider-config PATH`: Optional provider-specific config file
- `--seed INT`: Random seed
- `--save-dir DIR`: Output directory
- `--verbose` or `-v`: Verbose logs and full tracebacks on errors
Input conventions:
- Use exactly one of `--seq` or `--fasta`
- `--seq` accepts a single RNA sequence or multiple sequences separated by commas
- `--fasta` accepts a `.fasta/.fa/.fas` file path
### Subcommands
1) Sequence embeddings
Extract embeddings from RNA-FM/mRNA-FM:
```bash
rnapy seq embed \
--model rna-fm \
--model-path ./models/RNA-FM_pretrained.pth \
--seq "AGAUAGUCGUGGGU...UCGGCUAUC" \
--layer -1 \
--format mean \
--save-dir ./results/rna_fm
```
- `--layer`: which layer to use (default: `-1`, i.e., last layer)
- `--format {raw,mean,bos}`: output format (default: `mean`)
- You can also pass `--fasta path/to/input.fasta` instead of `--seq`
2) Structure prediction
Predict 2D RNA-FM or 3D (RhoFold) structure:
```bash
# 2D with mRNA-FM
rnapy struct predict \
--model rna-fm \
--model-path ./models/RNA-FM_pretrained.pth \
--seq "AGAUAGUCGUGGGU...UCGGCUAUC" \
--structure-type 2d \
--save-dir ./results/rna_fm_struct
# 3D with RhoFold (structure-type will auto-infer to 3d)
rnapy struct predict \
--model rhofold \
--model-path ./models/RhoFold_pretrained.pt \
--seq "GGAUCCCGCGCCC...GCCGGGUCC" \
--save-dir ./results/rhofold_3d
```
- If `--structure-type` is omitted: `rhofold` -> `3d`; `rna-fm`/`mrna-fm` -> `2d`
3) Inverse folding (generate sequences from structure)
RiboDiffusion and RhoDesign take a PDB as input:
```bash
# RiboDiffusion: generate multiple sequences
rnapy invfold gen \
--model ribodiffusion \
--model-path ./models/exp_inf.pth \
--pdb ./input/R1107.pdb \
--n-samples 2 \
--save-dir ./results/ribodiffusion
# RhoDesign: optional 2D guidance via NPY
rnapy invfold gen \
--model rhodesign \
--model-path ./models/ss_apexp_best.pth \
--pdb ./input/2zh6_B.pdb \
--ss-npy ./input/2zh6_B.npy \
--save-dir ./results/rhodesign
```
- `--pdb`: required
- `--ss-npy`: optional; only used by RhoDesign (2D guidance)
- `--n-samples`: number of sequences to sample (RhoDesign samples one per call; RiboDiffusion supports many)
4) MSA features (RNA-MSM)
Extract embeddings/attention from an aligned MSA:
```bash
rnapy msa features \
--model rna-msm \
--model-path ./models/RNA_MSM_pretrained_weights.pt \
--fasta ./input/example_msa.fasta \
--feature-type embeddings \
--layer -1 \
--save-dir ./results/rna_msm_features
```
- `--feature-type {embeddings,attention,both}` (default: `embeddings`)
- `--layer`: which layer to extract (default: `-1`)
5) MSA analysis (RNA-MSM)
Compute consensus and/or conservation from an MSA:
```bash
rnapy msa analyze \
--model rna-msm \
--model-path ./models/RNA_MSM_pretrained_weights.pt \
--fasta ./input/example_msa.fasta \
--extract-consensus \
--extract-conservation \
--save-dir ./results/rna_msm_analyze
```
- If you pass a single `--seq` (not multiple), this subcommand will error because it requires multiple sequences or a FASTA file
6) Metrics (structure evaluation)
- LDDT: see examples above, or run `rnapy metric lddt --help`
- RMSD: see examples above, or run `rnapy metric rmsd --help`
- TM-score: see examples above, or run `rnapy metric tm-score --help`
7) Sequence utilities
- Structure F1: `rnapy struct f1 --struct1 ... --struct2 ...`
- Sequence recovery: `rnapy seq recovery --native ... --designed ...`
### Outputs and logging
- When `--save-dir` is provided, results are written under that directory. The exact filenames depend on the provider/task (e.g., `.npy` for embeddings, `.ct` for 2D, `.pdb`/folder for 3D, `.json` for analysis summaries). The CLI prints a brief summary and (when applicable) a path hint.
- Exit codes: `0` on success; non-zero on errors. Add `-v/--verbose` for full tracebacks.
### Common pitfalls
- Do not pass both `--seq` and `--fasta` at the same time.
- Ensure the `--model-path` points to the correct checkpoint for the chosen `--model`.
- `rhofold` defaults to 3D; RNA-FM/mRNA-FM default to 2D if `--structure-type` is omitted.
- `msa analyze` requires multiple sequences (comma-separated via `--seq`) or a FASTA file.
## Run the Demos
From the repository root:
```powershell
# mRNA-FM / RNA-FM demo
cd .\demos
python .\demo_rna_fm.py
# RhoFold demo
python .\demo_rhofold.py
# RiboDiffusion demo
python .\demo_ribodiffusion.py
# RhoDesign demo
python .\demo_rhodesign.py
# RNA-MSM demo
python .\demo_rna_msm.py
# LDDT demo
python .\demo_lddt.py
# RMSD demo
python .\demo_rmsd.py
# TM-score demo
python .\demo_tm_score.py
# Sequence recovery & Structure F1 demo
python .\demo_f1_recovery.py
```
Additional examples may be available: `rna_fm_demo.py`, `rhofold_demo.py`, `ribodiffusion_demo.py`.
## Datasets
You can download example datasets via API or CLI (e.g., Rfam, RNA Puzzles, CASP15, etc.).
- Available dataset names: `Rfam`, `Rfam_original`, `RNA_Puzzles`, `CASP15`, `RNAsolo2`
- CLI:
```bash
# List available datasets
rnapy dataset list
# Download Rfam (from the HF mirror) with parallel workers
rnapy dataset download --dataset Rfam --max-workers 8
```
- Python:
```python
from rnapy.toolkit import RNAToolkit
toolkit = RNAToolkit()
print(toolkit.list_available_datasets())
toolkit.download_dataset("Rfam", max_workers=8)
```
## Configuration
YAML configs are provided under `./configs/` and `./demos/configs/`. You can:
- Pass `config_dir` to `RNAToolkit` to use custom defaults
- Override per-call parameters in `load_model(...)` and task methods
## License
MIT License
## Acknowledgements
- RNA-FM: https://github.com/ml4bio/RNA-FM
- RhoFold: https://github.com/ml4bio/RhoFold
- RiboDiffusion: https://github.com/ml4bio/RiboDiffusion
- RhoDesign: https://github.com/ml4bio/RhoDesign
- RNA-MSM: https://github.com/yikunpku/RNA-MSM
Raw data
{
"_id": null,
"home_page": "https://github.com/linorman/rnapy",
"name": "rnapy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "RNA, bioinformatics, machine-learning, structure-prediction, sequence-analysis",
"author": "Linorman",
"author_email": "Linorman <zyh52616@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/64/f8/3d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700/rnapy-3.2.2.tar.gz",
"platform": null,
"description": "# RNAPy \u2014 Unified RNA Analysis Toolkit\r\n\r\nRNAPy is a unified Python toolkit that wraps several powerful RNA models with a consistent, easy-to-use API. It currently integrates:\r\n\r\n- RNA-FM for sequence embeddings and 2D secondary structure prediction\r\n- RhoFold for 3D structure prediction\r\n- RiboDiffusion for inverse folding (sequence generation from structure)\r\n- RhoDesign for inverse folding (structure-to-sequence, optional 2D guidance)\r\n- RNA-MSM for MSA-based embeddings, attention, consensus, and conservation\r\n\r\n\r\n## Key Features\r\n\r\n- Consistent high-level API via `RNAToolkit`\r\n- Extract sequence embeddings (RNA-FM, mRNA-FM)\r\n- 2D structure prediction (RNA-FM)\r\n- 3D structure prediction (RhoFold)\r\n- Inverse folding (RiboDiffusion, RhoDesign)\r\n- MSA analysis and features (RNA-MSM: embeddings, attention, consensus, conservation)\r\n\r\n\r\n## Project Structure\r\n\r\n```\r\nRNAPy\r\n\u251c\u2500\u2500 rnapy/ # Library source\r\n\u2502 \u251c\u2500\u2500 core/ # Base classes, factory, config, exceptions\r\n\u2502 \u251c\u2500\u2500 providers/ # Model providers (rna_fm/mrna_fm, rhofold, RiboDiffusion, rhodesign, rna_msm)\r\n\u2502 \u251c\u2500\u2500 interfaces/ # Public interfaces\r\n\u2502 \u2514\u2500\u2500 utils/ # Utilities\r\n\u251c\u2500\u2500 configs/ # Global and model configs (YAML)\r\n\u251c\u2500\u2500 demos/ # Ready-to-run examples\r\n\u2502 \u251c\u2500\u2500 models/ # Put pretrained weights here\r\n\u2502 \u251c\u2500\u2500 results/ # Default output location for demos\r\n\u2502 \u2514\u2500\u2500 demo_*.py # Demo scripts\r\n\u251c\u2500\u2500 requirements.txt\r\n\u251c\u2500\u2500 setup.py\r\n\u2514\u2500\u2500 README.md\r\n```\r\n\r\n\r\n## Installation\r\n\r\nRecommended: Python 3.12+ and a recent PyTorch build compatible with your CPU/GPU.\r\n\r\n```\r\npip install rnapy --extra-index-url https://download.pytorch.org/whl/cpu \r\n```\r\n\r\n\r\n## Documentation\r\n\r\n- Toolkit usage guide: `docs/RNAToolkit_Usage_Guide.md`\r\n\r\n\r\n## Model Weights\r\n\r\n- You can download pretrained weights from the original repositories which will be mentioned in the Acknowledgements section.\r\n- Or you can find weights used in RNAPy on Hugging Face:\r\nhttps://huggingface.co/Linorman616/rnapy_models/\r\n- Actually if you don't provide `model-path` when loading a model, RNAPy will try to download the weights from this repo automatically.\r\n\r\n\r\n## Quick Start\r\n\r\n### 1) RNA-FM (2D structure + embeddings)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nsequence = \"AGAUAGUCGUGGGUUCCCUUUCUGGAGGGAGAGGGAAUUCCACGUUGACCGGGGGAACCGGCCAGGCCCGGAAGGGAGCAACCGUGCCCGGCUAUC\"\r\n\r\n# Initialize\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load model (choose one)\r\ntoolkit.load_model(\"rna-fm\", \"./models/RNA-FM_pretrained.pth\")\r\n\r\n# 2D structure prediction\r\nresult = toolkit.predict_structure(\r\n sequence,\r\n structure_type=\"2d\",\r\n model=\"rna-fm\",\r\n save_dir=\"./results/rna_fm/demo.ct\",\r\n)\r\n\r\n# Embeddings\r\nembeddings = toolkit.extract_embeddings(\r\n sequence,\r\n model=\"rna-fm\",\r\n save_dir=\"./results/rna_fm/embeddings.npy\",\r\n)\r\n\r\nprint(result.get(\"secondary_structure\"))\r\nprint(result.get(\"confidence_scores\"))\r\n```\r\n\r\n### 2) RhoFold (3D structure prediction)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nsequence = \"GGAUCCCGCGCCCCUUUCUCCCCGGUGAUCCCGCGAGCCCCGGUAAGGCCGGGUCC\"\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RhoFold\r\ntoolkit.load_model(\"rhofold\", \"./models/RhoFold_pretrained.pt\")\r\n\r\n# Predict 3D\r\nresult = toolkit.predict_structure(\r\n sequence,\r\n structure_type=\"3d\",\r\n model=\"rhofold\",\r\n save_dir=\"./results/rhofold\",\r\n relax_steps=500,\r\n)\r\n\r\npdb_file = result.get(\"structure_3d_refined\", result.get(\"structure_3d_unrelaxed\"))\r\nprint(\"3D structure:\", pdb_file)\r\n```\r\n\r\n### 3) RiboDiffusion (inverse folding from PDB)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\nstructure_file = \"./input/R1107.pdb\"\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RiboDiffusion\r\ntoolkit.load_model(\"ribodiffusion\", \"./models/exp_inf.pth\")\r\n\r\n# Generate sequences from structure\r\nresult = toolkit.generate_sequences_from_structure(\r\n structure_file=structure_file,\r\n model=\"ribodiffusion\",\r\n n_samples=2,\r\n sampling_steps=100,\r\n cond_scale=0.5,\r\n dynamic_threshold=True,\r\n save_dir=\"./results/ribodiffusion\",\r\n)\r\n\r\nprint(\"Generated count:\", result.get(\"sequence_count\", 0))\r\nprint(\"Output dir:\", result.get(\"output_directory\"))\r\n```\r\n\r\n### 4) RhoDesign (inverse folding with optional 2D guidance)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\npdb_path = \"./input/2zh6_B.pdb\"\r\nss_path = \"./input/2zh6_B.npy\" # optional numpy file with secondary-structure/contact info\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RhoDesign (with-2D variant checkpoint)\r\ntoolkit.load_model(\"rhodesign\", \"./models/ss_apexp_best.pth\")\r\n\r\n# Generate one sequence from structure (RhoDesign samples one sequence per call)\r\nres = toolkit.generate_sequences_from_structure(\r\n structure_file=pdb_path,\r\n model=\"rhodesign\",\r\n secondary_structure_file=ss_path, # omit or set None to run without 2D guidance\r\n save_dir=\"./results/rhodesign\"\r\n)\r\n\r\nprint(\"Predicted sequence:\", res[\"sequences\"][0])\r\nprint(\"Recovery rate:\", res.get(\"quality_metrics\", {}).get(\"sequence_recovery_rate\"))\r\nprint(\"FASTA:\", res.get(\"files\", {}).get(\"fasta_files\", [None])[0])\r\n```\r\n\r\n### 5) RNA-MSM (MSA features, consensus, conservation)\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\n# Initialize\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\n\r\n# Load RNA-MSM\r\ntoolkit.load_model(\"rna-msm\", \"./models/RNA_MSM_pretrained_weights.pt\")\r\n\r\n# Prepare an example MSA (aligned sequences)\r\nmsa_sequences = [\r\n \"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n \"AUGGCAAUUUUAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n \"AUGGCGAUUUCAUUUACCGCAGUCGUUACCAACAUACUCGACUUUAAAUGCC\",\r\n \"AUGGCGAUUUUAUUUACCGCAGUCGUUACCAGCAUACUCGACUUUAAAUGCC\",\r\n]\r\n\r\n# Extract embeddings (per-position, last layer by default)\r\nfeatures = toolkit.extract_msa_features(\r\n msa_sequences,\r\n feature_type=\"embeddings\",\r\n model=\"rna-msm\",\r\n save_dir=\"./results/rna_msm\",\r\n)\r\n\r\n# Analyze MSA for consensus and conservation\r\nmsa_result = toolkit.analyze_msa(\r\n msa_sequences,\r\n model=\"rna-msm\",\r\n extract_consensus=True,\r\n extract_conservation=True,\r\n save_dir=\"./results/rna_msm\",\r\n)\r\n\r\nprint(\"Consensus:\", msa_result.get(\"consensus_sequence\"))\r\nprint(\"Conservation (first 10):\", (msa_result.get(\"conservation_scores\") or [])[:10])\r\n```\r\n\r\n\r\n## Evaluation Metrics\r\n\r\nRNAPy ships with common structural evaluation metrics, available via both the Python API and the CLI.\r\n\r\n### LDDT (Local Distance Difference Test)\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\nres = toolkit.calculate_lddt(\r\n reference_structure=\"./demos/input/2zh6_B.pdb\",\r\n predicted_structure=\"./demos/input/R1107.pdb\",\r\n radius=15.0,\r\n distance_thresholds=(0.5, 1.0, 2.0, 4.0),\r\n return_column_scores=True,\r\n)\r\nprint(res[\"lddt\"]) # Global LDDT\r\nprint(res.get(\"columns\", [])[:5]) # Optional: first 5 per-residue column scores\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\nrnapy metric lddt \\\r\n --reference ./demos/input/2zh6_B.pdb \\\r\n --predicted ./demos/input/R1107.pdb \\\r\n --radius 15.0 \\\r\n --thresholds 0.5,1.0,2.0,4.0 \\\r\n --return-column-scores\r\n```\r\n\r\nExample script: `demos/demo_lddt.py`\r\n\r\n### RMSD (Root Mean Square Deviation)\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\nrmsd = toolkit.calculate_rmsd(\r\n \"./demos/input/rmsd_tests/resources/ci2_1.pdb\",\r\n \"./demos/input/rmsd_tests/resources/ci2_2.pdb\",\r\n file_format=\"pdb\",\r\n)\r\nprint(\"RMSD:\", rmsd)\r\n```\r\n\r\n- CLI (common flags only; see `rnapy metric rmsd --help` for details):\r\n\r\n```bash\r\nrnapy metric rmsd \\\r\n --file1 ./demos/input/rmsd_tests/resources/ci2_1.pdb \\\r\n --file2 ./demos/input/rmsd_tests/resources/ci2_2.pdb \\\r\n --file-format pdb \\\r\n --rotation kabsch\r\n```\r\n\r\nOther options include: `--reorder`, `--reorder-method inertia-hungarian`, `--use-reflections`, `--only-alpha-carbons`, `--ignore-hydrogen`, `--output-aligned-structure`, `--print-only-rmsd-atoms`, `--gzip-format`, etc.\r\n\r\nExample script: `demos/demo_rmsd.py`\r\n\r\n### TM-score\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit(device=\"cpu\")\r\nresult = toolkit.calculate_tm_score(\r\n structure_1=\"./demos/input/2zh6_B.pdb\",\r\n structure_2=\"./demos/input/R1107.pdb\",\r\n mol=\"rna\",\r\n)\r\nprint(result[\"raw_output\"]) # Raw TM-score tool output\r\nprint(result[\"tm_score_1\"]) # TM-score normalized by length 1\r\nprint(result[\"tm_score_2\"]) # TM-score normalized by length 2\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\nrnapy metric tm-score \\\r\n --struct1 ./demos/input/2zh6_B.pdb \\\r\n --struct2 ./demos/input/R1107.pdb \\\r\n --mol rna\r\n```\r\n\r\nExample script: `demos/demo_tm_score.py`\r\n\r\n\r\n## Sequence Recovery & Structure F1\r\n\r\nSequence recovery and secondary-structure F1 are common quality metrics for design and prediction.\r\n\r\n- Python:\r\n\r\n```python\r\nfrom rnapy import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\n\r\n# Structure F1 (dot-bracket)\r\nf1 = toolkit.calculate_structure_f1(\"(((...)))\", \"(((.....)))\")\r\nprint(f1) # {precision, recall, f1_score}\r\n\r\n# Sequence recovery rate\r\nrecovery = toolkit.calculate_sequence_recovery(\"AUGCUAGCUAGC\", \"AUGCUAGCUUGC\")\r\nprint(recovery[\"overall_recovery\"]) # overall recovery\r\n```\r\n\r\n- CLI:\r\n\r\n```bash\r\n# Structure F1\r\nrnapy struct f1 \\\r\n --struct1 \"(((...)))\" \\\r\n --struct2 \"(((.....)))\"\r\n\r\n# Sequence recovery\r\nrnapy seq recovery \\\r\n --native AUGCUAGCUAGC \\\r\n --designed AUGCUAGCUUGC\r\n```\r\n\r\nExample script: `demos/demo_f1_recovery.py`\r\n\r\n\r\n## Command Line Interface (CLI)\r\n\r\nThe package installs a console script named `rnapy` (via setup entry point). After installation, you can run `rnapy` from your shell.\r\n\r\n- Show top-level help:\r\n - `rnapy --help`\r\n- Show help for a subcommand:\r\n - `rnapy seq embed --help`\r\n\r\n### Global options\r\n\r\nThese options are shared by all subcommands:\r\n\r\n- `--device {cpu,cuda}`: Computing device (default: `cpu`)\r\n- `--model {rna-fm,mrna-fm,rhofold,ribodiffusion,rhodesign,rna-msm}`: Model provider (required)\r\n- `--model-path PATH`: Path to the model checkpoint (required)\r\n- `--config-dir PATH`: Configuration directory (default: `configs`)\r\n- `--provider-config PATH`: Optional provider-specific config file\r\n- `--seed INT`: Random seed\r\n- `--save-dir DIR`: Output directory\r\n- `--verbose` or `-v`: Verbose logs and full tracebacks on errors\r\n\r\nInput conventions:\r\n\r\n- Use exactly one of `--seq` or `--fasta`\r\n - `--seq` accepts a single RNA sequence or multiple sequences separated by commas\r\n - `--fasta` accepts a `.fasta/.fa/.fas` file path\r\n\r\n### Subcommands\r\n\r\n1) Sequence embeddings\r\n\r\nExtract embeddings from RNA-FM/mRNA-FM:\r\n\r\n```bash\r\nrnapy seq embed \\\r\n --model rna-fm \\\r\n --model-path ./models/RNA-FM_pretrained.pth \\\r\n --seq \"AGAUAGUCGUGGGU...UCGGCUAUC\" \\\r\n --layer -1 \\\r\n --format mean \\\r\n --save-dir ./results/rna_fm\r\n```\r\n\r\n- `--layer`: which layer to use (default: `-1`, i.e., last layer)\r\n- `--format {raw,mean,bos}`: output format (default: `mean`)\r\n- You can also pass `--fasta path/to/input.fasta` instead of `--seq`\r\n\r\n2) Structure prediction\r\n\r\nPredict 2D RNA-FM or 3D (RhoFold) structure:\r\n\r\n```bash\r\n# 2D with mRNA-FM\r\nrnapy struct predict \\\r\n --model rna-fm \\\r\n --model-path ./models/RNA-FM_pretrained.pth \\\r\n --seq \"AGAUAGUCGUGGGU...UCGGCUAUC\" \\\r\n --structure-type 2d \\\r\n --save-dir ./results/rna_fm_struct\r\n\r\n# 3D with RhoFold (structure-type will auto-infer to 3d)\r\nrnapy struct predict \\\r\n --model rhofold \\\r\n --model-path ./models/RhoFold_pretrained.pt \\\r\n --seq \"GGAUCCCGCGCCC...GCCGGGUCC\" \\\r\n --save-dir ./results/rhofold_3d\r\n```\r\n\r\n- If `--structure-type` is omitted: `rhofold` -> `3d`; `rna-fm`/`mrna-fm` -> `2d`\r\n\r\n3) Inverse folding (generate sequences from structure)\r\n\r\nRiboDiffusion and RhoDesign take a PDB as input:\r\n\r\n```bash\r\n# RiboDiffusion: generate multiple sequences\r\nrnapy invfold gen \\\r\n --model ribodiffusion \\\r\n --model-path ./models/exp_inf.pth \\\r\n --pdb ./input/R1107.pdb \\\r\n --n-samples 2 \\\r\n --save-dir ./results/ribodiffusion\r\n\r\n# RhoDesign: optional 2D guidance via NPY\r\nrnapy invfold gen \\\r\n --model rhodesign \\\r\n --model-path ./models/ss_apexp_best.pth \\\r\n --pdb ./input/2zh6_B.pdb \\\r\n --ss-npy ./input/2zh6_B.npy \\\r\n --save-dir ./results/rhodesign\r\n```\r\n\r\n- `--pdb`: required\r\n- `--ss-npy`: optional; only used by RhoDesign (2D guidance)\r\n- `--n-samples`: number of sequences to sample (RhoDesign samples one per call; RiboDiffusion supports many)\r\n\r\n4) MSA features (RNA-MSM)\r\n\r\nExtract embeddings/attention from an aligned MSA:\r\n\r\n```bash\r\nrnapy msa features \\\r\n --model rna-msm \\\r\n --model-path ./models/RNA_MSM_pretrained_weights.pt \\\r\n --fasta ./input/example_msa.fasta \\\r\n --feature-type embeddings \\\r\n --layer -1 \\\r\n --save-dir ./results/rna_msm_features\r\n```\r\n\r\n- `--feature-type {embeddings,attention,both}` (default: `embeddings`)\r\n- `--layer`: which layer to extract (default: `-1`)\r\n\r\n5) MSA analysis (RNA-MSM)\r\n\r\nCompute consensus and/or conservation from an MSA:\r\n\r\n```bash\r\nrnapy msa analyze \\\r\n --model rna-msm \\\r\n --model-path ./models/RNA_MSM_pretrained_weights.pt \\\r\n --fasta ./input/example_msa.fasta \\\r\n --extract-consensus \\\r\n --extract-conservation \\\r\n --save-dir ./results/rna_msm_analyze\r\n```\r\n\r\n- If you pass a single `--seq` (not multiple), this subcommand will error because it requires multiple sequences or a FASTA file\r\n\r\n6) Metrics (structure evaluation)\r\n\r\n- LDDT: see examples above, or run `rnapy metric lddt --help`\r\n- RMSD: see examples above, or run `rnapy metric rmsd --help`\r\n- TM-score: see examples above, or run `rnapy metric tm-score --help`\r\n\r\n7) Sequence utilities\r\n\r\n- Structure F1: `rnapy struct f1 --struct1 ... --struct2 ...`\r\n- Sequence recovery: `rnapy seq recovery --native ... --designed ...`\r\n\r\n### Outputs and logging\r\n\r\n- When `--save-dir` is provided, results are written under that directory. The exact filenames depend on the provider/task (e.g., `.npy` for embeddings, `.ct` for 2D, `.pdb`/folder for 3D, `.json` for analysis summaries). The CLI prints a brief summary and (when applicable) a path hint.\r\n- Exit codes: `0` on success; non-zero on errors. Add `-v/--verbose` for full tracebacks.\r\n\r\n### Common pitfalls\r\n\r\n- Do not pass both `--seq` and `--fasta` at the same time.\r\n- Ensure the `--model-path` points to the correct checkpoint for the chosen `--model`.\r\n- `rhofold` defaults to 3D; RNA-FM/mRNA-FM default to 2D if `--structure-type` is omitted.\r\n- `msa analyze` requires multiple sequences (comma-separated via `--seq`) or a FASTA file.\r\n\r\n\r\n## Run the Demos\r\n\r\nFrom the repository root:\r\n\r\n```powershell\r\n# mRNA-FM / RNA-FM demo\r\ncd .\\demos\r\npython .\\demo_rna_fm.py\r\n\r\n# RhoFold demo\r\npython .\\demo_rhofold.py\r\n\r\n# RiboDiffusion demo\r\npython .\\demo_ribodiffusion.py\r\n\r\n# RhoDesign demo\r\npython .\\demo_rhodesign.py\r\n\r\n# RNA-MSM demo\r\npython .\\demo_rna_msm.py\r\n\r\n# LDDT demo\r\npython .\\demo_lddt.py\r\n\r\n# RMSD demo\r\npython .\\demo_rmsd.py\r\n\r\n# TM-score demo\r\npython .\\demo_tm_score.py\r\n\r\n# Sequence recovery & Structure F1 demo\r\npython .\\demo_f1_recovery.py\r\n```\r\n\r\nAdditional examples may be available: `rna_fm_demo.py`, `rhofold_demo.py`, `ribodiffusion_demo.py`.\r\n\r\n## Datasets\r\n\r\nYou can download example datasets via API or CLI (e.g., Rfam, RNA Puzzles, CASP15, etc.).\r\n\r\n- Available dataset names: `Rfam`, `Rfam_original`, `RNA_Puzzles`, `CASP15`, `RNAsolo2`\r\n\r\n- CLI:\r\n\r\n```bash\r\n# List available datasets\r\nrnapy dataset list\r\n\r\n# Download Rfam (from the HF mirror) with parallel workers\r\nrnapy dataset download --dataset Rfam --max-workers 8\r\n```\r\n\r\n- Python\uff1a\r\n\r\n```python\r\nfrom rnapy.toolkit import RNAToolkit\r\n\r\ntoolkit = RNAToolkit()\r\nprint(toolkit.list_available_datasets())\r\ntoolkit.download_dataset(\"Rfam\", max_workers=8)\r\n```\r\n\r\n## Configuration\r\n\r\nYAML configs are provided under `./configs/` and `./demos/configs/`. You can:\r\n\r\n- Pass `config_dir` to `RNAToolkit` to use custom defaults\r\n- Override per-call parameters in `load_model(...)` and task methods\r\n\r\n## License\r\n\r\nMIT License\r\n\r\n\r\n## Acknowledgements\r\n\r\n- RNA-FM: https://github.com/ml4bio/RNA-FM\r\n- RhoFold: https://github.com/ml4bio/RhoFold\r\n- RiboDiffusion: https://github.com/ml4bio/RiboDiffusion\r\n- RhoDesign: https://github.com/ml4bio/RhoDesign\r\n- RNA-MSM: https://github.com/yikunpku/RNA-MSM\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Unified RNA Analysis Toolkit - ML-powered RNA sequence analysis and structure prediction",
"version": "3.2.2",
"project_urls": {
"Bug Reports": "https://github.com/linorman/rnapy/issues",
"Documentation": "https://github.com/linorman/rnapy/blob/main/README.md",
"Homepage": "https://github.com/linorman/rnapy",
"Repository": "https://github.com/linorman/rnapy"
},
"split_keywords": [
"rna",
" bioinformatics",
" machine-learning",
" structure-prediction",
" sequence-analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8709b8681423b50d45b68f7187bd6f74ddedd0f012ac8e5389d295cdb2b42e3f",
"md5": "d73d38d03601ea66aa4929cc15133b19",
"sha256": "99c3e1e61865b1e2073b65d96296cbe8917a5f2584d7793e3893beec3e14086d"
},
"downloads": -1,
"filename": "rnapy-3.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d73d38d03601ea66aa4929cc15133b19",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18161121,
"upload_time": "2025-10-14T01:07:13",
"upload_time_iso_8601": "2025-10-14T01:07:13.660900Z",
"url": "https://files.pythonhosted.org/packages/87/09/b8681423b50d45b68f7187bd6f74ddedd0f012ac8e5389d295cdb2b42e3f/rnapy-3.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "64f83d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700",
"md5": "8f5c81af581ab5cfbc0777c254d15a59",
"sha256": "a158ed9c1637fdade3f043864c40c1e39aa4d2dcbc5e46203ecf33d721a2055a"
},
"downloads": -1,
"filename": "rnapy-3.2.2.tar.gz",
"has_sig": false,
"md5_digest": "8f5c81af581ab5cfbc0777c254d15a59",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18169748,
"upload_time": "2025-10-14T01:07:22",
"upload_time_iso_8601": "2025-10-14T01:07:22.558597Z",
"url": "https://files.pythonhosted.org/packages/64/f8/3d1d0bdadea95b60736a70f94306587d6e6e750cd36971fa444095b84700/rnapy-3.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-14 01:07:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "linorman",
"github_project": "rnapy",
"github_not_found": true,
"lcname": "rnapy"
}