============================
Ensemble Parallel MDAnalysis
============================
**Warning: This package is still under constrution.**
|pypi| |travis| |readthedocs| |codecov|
|mdanalysis|
|colab|
ENPMDA is a parallel analysis package for ensemble simulations
powered by MDAnalysis.
It stores metadata in ``pandas.DataFrame``
and distributes computation jobs in ``dask.DataFrame``
so that the parallel analysis can be performed
not only for one single trajectory
but also across simulations and analyses.
It can be used as an initial inspection of
the raw trajectories as well as a framework for
extracting features from final production simulations
for further e.g. machine learning and markov
state modeling. It automatically fixes the PBC issue, and
align and center the protein inside the simulation box.
It also works for multimeric proteins!
The framework is intended to be adaptable by being able to
simply wrapping MDAnalysis analysis functions without worrying
about the parallel machinery behind.
* Free software: GNU General Public License v3
* Documentation: https://ENPMDA.readthedocs.io.
Features
--------
* Parallel analysis for ensemble simulations.
* Dataframe for storing and accessing results.
* dask-based task scheduler, suitable for both workstations and clusters.
* Expandable analysis library powered by MDAnalysis.
Example Code Snippet
--------------------
.. code:: python
from ENPMDA import MDDataFrame
from ENPMDA.preprocessing import TrajectoryEnsemble
from ENPMDA.analysis import get_backbonetorsion, rmsd_to_init
# construct trajectory ensemble
traj_ensemble = TrajectoryEnsemble(
ensemble_name='ensemble',
topology_list=ensemble_top_list,
trajectory_list=ensemble_traj_list
)
traj_ensemble.load_ensemble()
# initilize dataframe and add trajectory ensemble
md_dataframe = MDDataFrame(dataframe_name='dataframe')
md_dataframe.add_traj_ensemble(traj_ensemble, npartitions=16)
# add analyses
md_dataframe.add_analysis(get_backbonetorsion)
md_dataframe.add_analysis(rmsd_to_init)
# save dataframe
md_dataframe.save('results')
# retrieve feature
feature_dataframe = md_dataframe.get_feature([
'torsion',
'rmsd_to_init'
])
# plot analysis results
import seaborn as sns
sns.barplot(data=feature_dataframe,
x='system',
y='rmsd_to_init')
sns.lineplot(data=feature_dataframe,
x='traj_time',
y='0_phi_cos',
hue='system')
Workflow Illustration
---------------------
.. image:: https://mermaid.ink/img/pako:eNqFklFPwjAQx7_Kpc8DjY8EMcLAmBhjhJgYRki3HqPStbPtAnPw3b0xppCY2Jde7-5_90vvKpYYgazHUsvzNTy9Rhro3M9HRjtvi8TDzPIPTLyx5Vg7zGKFC-h0BjBsUofVZyGTDSRrTDZSp2B0aurbyaxQ3EsqdHc45R6F-3d0exjNI8aFgH48iI0WKJbe5EaZtFwq6Xz_Kh4EFLQDo1W5tHx7O7MFRmxxUerZ7CGsfIso0QFXFrkoIbeYW5OgcyiCuhBgN-3Cy3AEK7lD0UKFZ1Bjgvq7X_jbb_I_-Y9scpQ9kIIYtVsZm6GAC96tVApihK2V3qPukpYFLEObcSloMlVdifRrzAinR2bMHVnBmf-NW8lpMq5OqJrWEVsZ7afy66S6uc53J1UbnPBMqrIJP2qPNmJ1mD7mQAhFLrjHsZAEynq0DBgwXngzLXXSvpucUHLan6xxHr4B8eTGgA
User Cases
----------
.. image:: /docs/source/_static/example.png
:width: 700
:alt: Illustration of the ensemble analysis workflow.
Benchmarking
------------
For a system of 250,000 atoms (1500 protein residues), the total time for analyzing 220,000 frames of
* RMSD to initial frame
* Pore hydration
* All protein torsion angle
* All C-alpha positions
* 15,000 pair-wise distances
is **10 minutes** using 5 nodes in Dardel_ (640 cores).
.. image:: /docs/source/_static/benchmark.png
:width: 700
:alt: Benchmark of the ensemble analysis workflow.
TODO
----
* option to add more than one ensemble
* more analysis functions.
* unit testing
* benchmarking
* documentation
* add functions to cancel running tasks
See Also
--------
* MDAnaysis: https://www.mdanalysis.org/
* pmda: https://github.com/mdAnalysis/pmda
* dask: https://dask.org/
Credits
-------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
.. _Dardel: https://www.pdc.kth.se/hpc-services/computing-systems/about-dardel-1.1053338
.. |mdanalysis| image:: https://img.shields.io/badge/powered%20by-MDAnalysis-orange.svg?logoWidth=16&logo=
:alt: Powered by MDAnalysis
:target: https://www.mdanalysis.org
.. |pypi| image:: https://img.shields.io/pypi/v/ENPMDA.svg
:target: https://pypi.python.org/pypi/ENPMDA
.. |travis| image:: https://img.shields.io/travis/yuxuanzhuang/ENPMDA.svg
:target: https://travis-ci.com/yuxuanzhuang/ENPMDA
.. |readthedocs| image:: https://readthedocs.org/projects/pip/badge/?version=latest&style=flat
.. |codecov| image:: https://codecov.io/gh/yuxuanzhuang/ENPMDA/branch/main/graph/badge.svg
:alt: Coverage Status
:target: https://codecov.io/gh/yuxuanzhuang/ENPMDA
.. |colab| image:: https://colab.research.google.com/assets/colab-badge.svg
:alt: open in colab
:target: https://colab.research.google.com/github/yuxuanzhuang/ENPMDA/blob/main/docs/source/examples/examples.ipynb
Raw data
{
"_id": null,
"home_page": "https://github.com/yuxuanzhuang/ENPMDA",
"name": "ENPMDA",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "ENPMDA, MDAnalysis, Dask, molecular-dynamics",
"author": "Yuxuan Zhuang",
"author_email": "Yuxuan Zhuang <wsygzyx@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/78/d7/3a4f71cdf6b45a2b96a979d9a7b3aced5bbc65d2d697ee27b9351d0c67c3/enpmda-1.0.0.tar.gz",
"platform": null,
"description": "============================\nEnsemble Parallel MDAnalysis\n============================\n\n**Warning: This package is still under constrution.**\n\n\n|pypi| |travis| |readthedocs| |codecov|\n\n\n\n|mdanalysis|\n\n|colab|\n\n\n\nENPMDA is a parallel analysis package for ensemble simulations\npowered by MDAnalysis.\n\nIt stores metadata in ``pandas.DataFrame`` \nand distributes computation jobs in ``dask.DataFrame``\nso that the parallel analysis can be performed\nnot only for one single trajectory\nbut also across simulations and analyses.\n\nIt can be used as an initial inspection of\nthe raw trajectories as well as a framework for \nextracting features from final production simulations\nfor further e.g. machine learning and markov\nstate modeling. It automatically fixes the PBC issue, and\nalign and center the protein inside the simulation box.\nIt also works for multimeric proteins!\n\nThe framework is intended to be adaptable by being able to\nsimply wrapping MDAnalysis analysis functions without worrying\nabout the parallel machinery behind.\n\n\n* Free software: GNU General Public License v3\n* Documentation: https://ENPMDA.readthedocs.io.\n\n\nFeatures\n--------\n\n* Parallel analysis for ensemble simulations.\n* Dataframe for storing and accessing results.\n* dask-based task scheduler, suitable for both workstations and clusters.\n* Expandable analysis library powered by MDAnalysis.\n\nExample Code Snippet\n--------------------\n\n.. code:: python\n\n from ENPMDA import MDDataFrame\n from ENPMDA.preprocessing import TrajectoryEnsemble\n from ENPMDA.analysis import get_backbonetorsion, rmsd_to_init\n\n # construct trajectory ensemble\n traj_ensemble = TrajectoryEnsemble(\n ensemble_name='ensemble',\n topology_list=ensemble_top_list,\n trajectory_list=ensemble_traj_list\n )\n traj_ensemble.load_ensemble()\n \n # initilize dataframe and add trajectory ensemble\n md_dataframe = MDDataFrame(dataframe_name='dataframe')\n md_dataframe.add_traj_ensemble(traj_ensemble, npartitions=16)\n \n # add analyses\n md_dataframe.add_analysis(get_backbonetorsion)\n md_dataframe.add_analysis(rmsd_to_init)\n\n \n # save dataframe\n md_dataframe.save('results')\n \n # retrieve feature\n feature_dataframe = md_dataframe.get_feature([\n 'torsion',\n 'rmsd_to_init'\n ])\n \n # plot analysis results\n import seaborn as sns\n sns.barplot(data=feature_dataframe,\n x='system',\n y='rmsd_to_init')\n sns.lineplot(data=feature_dataframe,\n x='traj_time',\n y='0_phi_cos',\n hue='system')\n\n\nWorkflow Illustration\n---------------------\n\n.. image:: https://mermaid.ink/img/pako:eNqFklFPwjAQx7_Kpc8DjY8EMcLAmBhjhJgYRki3HqPStbPtAnPw3b0xppCY2Jde7-5_90vvKpYYgazHUsvzNTy9Rhro3M9HRjtvi8TDzPIPTLyx5Vg7zGKFC-h0BjBsUofVZyGTDSRrTDZSp2B0aurbyaxQ3EsqdHc45R6F-3d0exjNI8aFgH48iI0WKJbe5EaZtFwq6Xz_Kh4EFLQDo1W5tHx7O7MFRmxxUerZ7CGsfIso0QFXFrkoIbeYW5OgcyiCuhBgN-3Cy3AEK7lD0UKFZ1Bjgvq7X_jbb_I_-Y9scpQ9kIIYtVsZm6GAC96tVApihK2V3qPukpYFLEObcSloMlVdifRrzAinR2bMHVnBmf-NW8lpMq5OqJrWEVsZ7afy66S6uc53J1UbnPBMqrIJP2qPNmJ1mD7mQAhFLrjHsZAEynq0DBgwXngzLXXSvpucUHLan6xxHr4B8eTGgA\n\n\nUser Cases\n----------\n.. image:: /docs/source/_static/example.png\n :width: 700\n :alt: Illustration of the ensemble analysis workflow.\n\nBenchmarking\n------------\nFor a system of 250,000 atoms (1500 protein residues), the total time for analyzing 220,000 frames of\n\n* RMSD to initial frame\n* Pore hydration\n* All protein torsion angle\n* All C-alpha positions\n* 15,000 pair-wise distances\n \nis **10 minutes** using 5 nodes in Dardel_ (640 cores).\n\n.. image:: /docs/source/_static/benchmark.png\n :width: 700\n :alt: Benchmark of the ensemble analysis workflow.\n\nTODO\n----\n* option to add more than one ensemble\n* more analysis functions.\n* unit testing\n* benchmarking\n* documentation\n* add functions to cancel running tasks\n\nSee Also\n--------\n* MDAnaysis: https://www.mdanalysis.org/\n* pmda: https://github.com/mdAnalysis/pmda\n* dask: https://dask.org/\n\n\nCredits\n-------\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n.. _Dardel: https://www.pdc.kth.se/hpc-services/computing-systems/about-dardel-1.1053338\n\n.. |mdanalysis| image:: https://img.shields.io/badge/powered%20by-MDAnalysis-orange.svg?logoWidth=16&logo=\n :alt: Powered by MDAnalysis\n :target: https://www.mdanalysis.org\n \n.. |pypi| image:: https://img.shields.io/pypi/v/ENPMDA.svg\n :target: https://pypi.python.org/pypi/ENPMDA\n\n.. |travis| image:: https://img.shields.io/travis/yuxuanzhuang/ENPMDA.svg\n :target: https://travis-ci.com/yuxuanzhuang/ENPMDA\n\n.. |readthedocs| image:: https://readthedocs.org/projects/pip/badge/?version=latest&style=flat\n\n.. |codecov| image:: https://codecov.io/gh/yuxuanzhuang/ENPMDA/branch/main/graph/badge.svg\n :alt: Coverage Status\n :target: https://codecov.io/gh/yuxuanzhuang/ENPMDA\n\n\n\n.. |colab| image:: https://colab.research.google.com/assets/colab-badge.svg\n :alt: open in colab\n :target: https://colab.research.google.com/github/yuxuanzhuang/ENPMDA/blob/main/docs/source/examples/examples.ipynb\n\n\n",
"bugtrack_url": null,
"license": "GNU General Public License v3",
"summary": "Parallel analysis for ensemble simulations",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/yuxuanzhuang/ENPMDA",
"Issues": "https://github.com/yuxuanzhuang/ENPMDA/issues"
},
"split_keywords": [
"enpmda",
" mdanalysis",
" dask",
" molecular-dynamics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b909a405247ce4b7df5a226aedd7f0a16cb4a5816869e8473a2f891eb6d685da",
"md5": "a2751c666d598b02715183a7d70e548f",
"sha256": "97514449a090172b8544f697172984975db63fe1ce14e6afdbd41e14b0d8d083"
},
"downloads": -1,
"filename": "enpmda-1.0.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "a2751c666d598b02715183a7d70e548f",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.10",
"size": 23835,
"upload_time": "2025-10-13T03:57:25",
"upload_time_iso_8601": "2025-10-13T03:57:25.581732Z",
"url": "https://files.pythonhosted.org/packages/b9/09/a405247ce4b7df5a226aedd7f0a16cb4a5816869e8473a2f891eb6d685da/enpmda-1.0.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "78d73a4f71cdf6b45a2b96a979d9a7b3aced5bbc65d2d697ee27b9351d0c67c3",
"md5": "210c79b99c742d2f7b27789acec2fff1",
"sha256": "e7289ef2aaa255d3425d72fca8b0b4eeeccb26aea7bebbf1d0945a0b4aba9fda"
},
"downloads": -1,
"filename": "enpmda-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "210c79b99c742d2f7b27789acec2fff1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1967862,
"upload_time": "2025-10-13T03:57:26",
"upload_time_iso_8601": "2025-10-13T03:57:26.838643Z",
"url": "https://files.pythonhosted.org/packages/78/d7/3a4f71cdf6b45a2b96a979d9a7b3aced5bbc65d2d697ee27b9351d0c67c3/enpmda-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-13 03:57:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yuxuanzhuang",
"github_project": "ENPMDA",
"travis_ci": true,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "dask",
"specs": [
[
">=",
"2024.1.0"
]
]
},
{
"name": "distributed",
"specs": [
[
">=",
"2024.1.0"
]
]
},
{
"name": "bokeh",
"specs": [
[
"==",
"2.4.2"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.65"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.23"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"1.5"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.3"
]
]
},
{
"name": "loguru",
"specs": []
},
{
"name": "mdanalysis",
"specs": []
}
],
"tox": true,
"lcname": "enpmda"
}