Name | xenosite-fragment JSON |
Version |
0a11
JSON |
| download |
home_page | |
Summary | Library for molecule fragment operations. |
upload_time | 2023-07-23 04:33:59 |
maintainer | |
docs_url | None |
author | S. Joshua Swamidass |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Xenosite Fragment
A library for processing molecule fragments.
Install from pypi:
```
pip install xenosite-fragment
```
Create a fragment from a SMILES or a fragment SMILES string:
```
>>> str(Fragment("CCCC")) # Valid smiles
'C-C-C-C'
>>> str(Fragment("ccCC")) # not a valid SMILES
'c:c-C-C'
```
Optionally, create a fragment of a molecule from a string and (optionally) a list of nodes
in the fragment.
```
>>> F = Fragment("CCCCCCOc1ccccc1", [0,1,2,3,4,5])
>>> str(F) # hexane
'C-C-C-C-C-C'
```
If IDs are provided, they MUST select a connected fragment.
```
>>> F = Fragment("CCCCCCOc1ccccc1", [0,10])
Traceback (most recent call last):
...
ValueError: Multiple components in graph are not allowed.
```
Get the canonical representation of a fragment:
```
>>> Fragment("O-C").canonical().string
'C-O'
>>> Fragment("OC").canonical().string
'C-O'
>>> Fragment("CO").canonical().string
'C-O'
```
Get the reordering of nodes used to create the canonical
string representation. If remap=True, then the ID are remapped to the input
representation used to initalize the Fragment.
```
>>> Fragment("COC", [1,2]).canonical(remap=True).reordering
[2, 1]
>>> Fragment("COC", [1,2]).canonical().reordering
[1, 0]
```
Match fragment to a molecule. By default, the ID
correspond with fragment IDs. If remap=True, the ID
corresponds to the input representation when the Fragment
was initialized.
```
>>> smiles = "CCCC1CCOCN1"
>>> F = Fragment("CCCCCC") # hexane as a string
>>> list(F.matches(smiles)) # smiles string (least efficient)
[(0, 1, 2, 3, 4, 5)]
>>> import rdkit
>>> mol = rdkit.Chem.MolFromSmiles(smiles)
>>> list(F.matches(mol)) # RDKit mol
[(0, 1, 2, 3, 4, 5)]
>>> mol_graph = Graph.from_molecule(mol)
>>> list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient)
[(0, 1, 2, 3, 4, 5)]
```
Matches ensure that the fragment string of matches is the same as
the fragment. This is different than standards SMARTS matching,
and *prevents* rings from matching unclosed ring patterns:
```
>>> str(Fragment("C1CCCCC1")) # cyclohexane
'C1-C-C-C-C-C-1'
>>> assert(str(Fragment("C1CCCCC1")) != str(F)) # cyclohexane is not hexane
>>> list(F.matches("C1CCCCC1")) # Unlike SMARTS, no match!
[]
```
Efficiently create multiple fragments by reusing a
precomputed graph:
```
>>> import rdkit
>>>
>>> mol = rdkit.Chem.MolFromSmiles("c1ccccc1OCCC")
>>> mol_graph = Graph.from_molecule(mol)
>>>
>>> f1 = Fragment(mol_graph, [0])
>>> f2 = Fragment(mol_graph, [6,5,4])
```
Find matches to fragments:
```
>>> list(f1.matches(mol))
[(0,), (1,), (2,), (3,), (4,), (5,)]
>>> list(f2.matches(mol))
[(6, 5, 4), (6, 5, 0)]
```
Fragments know how to report if they are canonically the same as each other or strings.
```
>>> Fragment("CCO") == Fragment("OCC")
True
>>> Fragment("CCO") == "C-C-O"
True
```
Note, however, that strings are not converted to canonical form. Therefore,
```
>>> Fragment("CCO") == "CCO"
False
```
Enumerate and compute statistics on all the subgraphs in a molecule:
```
>>> from xenosite.fragment.net import SubGraphFragmentNetwork
>>> N = SubGraphFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C']
>>> fragments["size"].to_numpy()
array([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])
```
Better fragments can be enumerated by collapsing all atoms in a ring into a single node
during subgraph enumeration.
```
>>> from xenosite.fragment.net import RingFragmentNetwork
>>> N = RingFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1']
>>> fragments["size"].to_numpy()
array([5, 1, 4])
```
Raw data
{
"_id": null,
"home_page": "",
"name": "xenosite-fragment",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "S. Joshua Swamidass",
"author_email": "swamidass@wustl.edu",
"download_url": "https://files.pythonhosted.org/packages/ed/23/e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7/xenosite-fragment-0a11.tar.gz",
"platform": null,
"description": "# Xenosite Fragment\n\nA library for processing molecule fragments. \n\nInstall from pypi:\n\n```\npip install xenosite-fragment\n```\n\nCreate a fragment from a SMILES or a fragment SMILES string:\n\n```\n>>> str(Fragment(\"CCCC\")) # Valid smiles\n'C-C-C-C'\n\n>>> str(Fragment(\"ccCC\")) # not a valid SMILES\n'c:c-C-C'\n```\n\nOptionally, create a fragment of a molecule from a string and (optionally) a list of nodes\nin the fragment. \n\n```\n>>> F = Fragment(\"CCCCCCOc1ccccc1\", [0,1,2,3,4,5])\n>>> str(F) # hexane\n'C-C-C-C-C-C'\n```\n\nIf IDs are provided, they MUST select a connected fragment.\n\n```\n>>> F = Fragment(\"CCCCCCOc1ccccc1\", [0,10]) \nTraceback (most recent call last):\n ...\nValueError: Multiple components in graph are not allowed.\n```\n\nGet the canonical representation of a fragment:\n\n```\n>>> Fragment(\"O-C\").canonical().string\n'C-O'\n>>> Fragment(\"OC\").canonical().string\n'C-O'\n>>> Fragment(\"CO\").canonical().string\n'C-O'\n```\n\nGet the reordering of nodes used to create the canonical\nstring representation. If remap=True, then the ID are remapped to the input\nrepresentation used to initalize the Fragment.\n\n```\n>>> Fragment(\"COC\", [1,2]).canonical(remap=True).reordering\n[2, 1]\n>>> Fragment(\"COC\", [1,2]).canonical().reordering\n[1, 0]\n```\n\nMatch fragment to a molecule. By default, the ID\ncorrespond with fragment IDs. If remap=True, the ID\ncorresponds to the input representation when the Fragment\nwas initialized.\n\n```\n>>> smiles = \"CCCC1CCOCN1\"\n>>> F = Fragment(\"CCCCCC\") # hexane as a string\n>>> list(F.matches(smiles)) # smiles string (least efficient)\n[(0, 1, 2, 3, 4, 5)]\n\n>>> import rdkit\n>>> mol = rdkit.Chem.MolFromSmiles(smiles)\n>>> list(F.matches(mol)) # RDKit mol\n[(0, 1, 2, 3, 4, 5)]\n\n>>> mol_graph = Graph.from_molecule(mol)\n>>> list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient)\n[(0, 1, 2, 3, 4, 5)]\n```\n\nMatches ensure that the fragment string of matches is the same as\nthe fragment. This is different than standards SMARTS matching,\nand *prevents* rings from matching unclosed ring patterns:\n\n```\n>>> str(Fragment(\"C1CCCCC1\")) # cyclohexane\n'C1-C-C-C-C-C-1'\n\n>>> assert(str(Fragment(\"C1CCCCC1\")) != str(F)) # cyclohexane is not hexane\n>>> list(F.matches(\"C1CCCCC1\")) # Unlike SMARTS, no match!\n[]\n```\n\nEfficiently create multiple fragments by reusing a\nprecomputed graph:\n\n```\n>>> import rdkit\n>>>\n>>> mol = rdkit.Chem.MolFromSmiles(\"c1ccccc1OCCC\")\n>>> mol_graph = Graph.from_molecule(mol)\n>>>\n>>> f1 = Fragment(mol_graph, [0])\n>>> f2 = Fragment(mol_graph, [6,5,4])\n```\n\nFind matches to fragments:\n\n```\n>>> list(f1.matches(mol))\n[(0,), (1,), (2,), (3,), (4,), (5,)]\n\n>>> list(f2.matches(mol))\n[(6, 5, 4), (6, 5, 0)]\n```\n\nFragments know how to report if they are canonically the same as each other or strings.\n\n```\n>>> Fragment(\"CCO\") == Fragment(\"OCC\")\nTrue\n>>> Fragment(\"CCO\") == \"C-C-O\"\nTrue\n```\n\nNote, however, that strings are not converted to canonical form. Therefore,\n\n```\n>>> Fragment(\"CCO\") == \"CCO\"\nFalse\n```\n\nEnumerate and compute statistics on all the subgraphs in a molecule:\n\n```\n>>> from xenosite.fragment.net import SubGraphFragmentNetwork\n>>> N = SubGraphFragmentNetwork(\"CC1COC1\")\n>>> fragments = N.to_pandas()\n>>> list(fragments.index)\n['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C']\n>>> fragments[\"size\"].to_numpy()\narray([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])\n```\n\nBetter fragments can be enumerated by collapsing all atoms in a ring into a single node\nduring subgraph enumeration. \n\n```\n>>> from xenosite.fragment.net import RingFragmentNetwork\n>>> N = RingFragmentNetwork(\"CC1COC1\")\n>>> fragments = N.to_pandas()\n>>> list(fragments.index)\n['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1']\n>>> fragments[\"size\"].to_numpy()\narray([5, 1, 4])\n```\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Library for molecule fragment operations.",
"version": "0a11",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ed23e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7",
"md5": "9fb64f8dfdfbc8b20634901d138ff3d5",
"sha256": "11586d5c31eece2b279be90bf8df3bc09bb87881420ce1ef8d51bb2d0f63bb89"
},
"downloads": -1,
"filename": "xenosite-fragment-0a11.tar.gz",
"has_sig": false,
"md5_digest": "9fb64f8dfdfbc8b20634901d138ff3d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 26793,
"upload_time": "2023-07-23T04:33:59",
"upload_time_iso_8601": "2023-07-23T04:33:59.893542Z",
"url": "https://files.pythonhosted.org/packages/ed/23/e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7/xenosite-fragment-0a11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-23 04:33:59",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "xenosite-fragment"
}