xenosite-fragment


Namexenosite-fragment JSON
Version 0a11 PyPI version JSON
download
home_page
SummaryLibrary for molecule fragment operations.
upload_time2023-07-23 04:33:59
maintainer
docs_urlNone
authorS. Joshua Swamidass
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Xenosite Fragment

A library for processing molecule fragments. 

Install from pypi:

```
pip install xenosite-fragment
```

Create a fragment from a SMILES or a fragment SMILES string:

```
>>> str(Fragment("CCCC")) # Valid smiles
'C-C-C-C'

>>> str(Fragment("ccCC")) # not a valid SMILES
'c:c-C-C'
```

Optionally, create a fragment of a molecule from a string and (optionally) a list of nodes
in the fragment. 

```
>>> F = Fragment("CCCCCCOc1ccccc1", [0,1,2,3,4,5])
>>> str(F)  # hexane
'C-C-C-C-C-C'
```

If IDs are provided, they MUST select a connected fragment.

```
>>> F = Fragment("CCCCCCOc1ccccc1", [0,10]) 
Traceback (most recent call last):
  ...
ValueError: Multiple components in graph are not allowed.
```

Get the canonical representation of a fragment:

```
>>> Fragment("O-C").canonical().string
'C-O'
>>> Fragment("OC").canonical().string
'C-O'
>>> Fragment("CO").canonical().string
'C-O'
```

Get the reordering of nodes used to create the canonical
string representation. If remap=True, then the ID are remapped to the input
representation used to initalize the Fragment.

```
>>> Fragment("COC", [1,2]).canonical(remap=True).reordering
[2, 1]
>>> Fragment("COC", [1,2]).canonical().reordering
[1, 0]
```

Match fragment to a molecule. By default, the ID
correspond with fragment IDs. If remap=True, the ID
corresponds to the input representation when the Fragment
was initialized.

```
>>> smiles = "CCCC1CCOCN1"
>>> F = Fragment("CCCCCC") # hexane as a string
>>> list(F.matches(smiles)) # smiles string (least efficient)
[(0, 1, 2, 3, 4, 5)]

>>> import rdkit
>>> mol = rdkit.Chem.MolFromSmiles(smiles)
>>> list(F.matches(mol))  # RDKit mol
[(0, 1, 2, 3, 4, 5)]

>>> mol_graph = Graph.from_molecule(mol)
>>> list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient)
[(0, 1, 2, 3, 4, 5)]
```

Matches ensure that the fragment string of matches is the same as
the fragment. This is different than standards SMARTS matching,
and *prevents* rings from matching unclosed ring patterns:

```
>>> str(Fragment("C1CCCCC1")) # cyclohexane
'C1-C-C-C-C-C-1'

>>> assert(str(Fragment("C1CCCCC1")) != str(F)) # cyclohexane is not hexane
>>> list(F.matches("C1CCCCC1")) # Unlike SMARTS, no match!
[]
```

Efficiently create multiple fragments by reusing a
precomputed graph:

```
>>> import rdkit
>>>
>>> mol = rdkit.Chem.MolFromSmiles("c1ccccc1OCCC")
>>> mol_graph = Graph.from_molecule(mol)
>>>
>>> f1 = Fragment(mol_graph, [0])
>>> f2 = Fragment(mol_graph, [6,5,4])
```

Find matches to fragments:

```
>>> list(f1.matches(mol))
[(0,), (1,), (2,), (3,), (4,), (5,)]

>>> list(f2.matches(mol))
[(6, 5, 4), (6, 5, 0)]
```

Fragments know how to report if they are canonically the same as each other or strings.

```
>>> Fragment("CCO") == Fragment("OCC")
True
>>> Fragment("CCO") == "C-C-O"
True
```

Note, however, that strings are not converted to canonical form. Therefore,

```
>>> Fragment("CCO") == "CCO"
False
```

Enumerate and compute statistics on all the subgraphs in a molecule:

```
>>> from xenosite.fragment.net import SubGraphFragmentNetwork
>>> N = SubGraphFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C']
>>> fragments["size"].to_numpy()
array([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])
```

Better fragments can be enumerated by collapsing all atoms in a ring into a single node
during subgraph enumeration. 

```
>>> from xenosite.fragment.net import RingFragmentNetwork
>>> N = RingFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1']
>>> fragments["size"].to_numpy()
array([5, 1, 4])
```


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "xenosite-fragment",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "S. Joshua Swamidass",
    "author_email": "swamidass@wustl.edu",
    "download_url": "https://files.pythonhosted.org/packages/ed/23/e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7/xenosite-fragment-0a11.tar.gz",
    "platform": null,
    "description": "# Xenosite Fragment\n\nA library for processing molecule fragments. \n\nInstall from pypi:\n\n```\npip install xenosite-fragment\n```\n\nCreate a fragment from a SMILES or a fragment SMILES string:\n\n```\n>>> str(Fragment(\"CCCC\")) # Valid smiles\n'C-C-C-C'\n\n>>> str(Fragment(\"ccCC\")) # not a valid SMILES\n'c:c-C-C'\n```\n\nOptionally, create a fragment of a molecule from a string and (optionally) a list of nodes\nin the fragment. \n\n```\n>>> F = Fragment(\"CCCCCCOc1ccccc1\", [0,1,2,3,4,5])\n>>> str(F)  # hexane\n'C-C-C-C-C-C'\n```\n\nIf IDs are provided, they MUST select a connected fragment.\n\n```\n>>> F = Fragment(\"CCCCCCOc1ccccc1\", [0,10]) \nTraceback (most recent call last):\n  ...\nValueError: Multiple components in graph are not allowed.\n```\n\nGet the canonical representation of a fragment:\n\n```\n>>> Fragment(\"O-C\").canonical().string\n'C-O'\n>>> Fragment(\"OC\").canonical().string\n'C-O'\n>>> Fragment(\"CO\").canonical().string\n'C-O'\n```\n\nGet the reordering of nodes used to create the canonical\nstring representation. If remap=True, then the ID are remapped to the input\nrepresentation used to initalize the Fragment.\n\n```\n>>> Fragment(\"COC\", [1,2]).canonical(remap=True).reordering\n[2, 1]\n>>> Fragment(\"COC\", [1,2]).canonical().reordering\n[1, 0]\n```\n\nMatch fragment to a molecule. By default, the ID\ncorrespond with fragment IDs. If remap=True, the ID\ncorresponds to the input representation when the Fragment\nwas initialized.\n\n```\n>>> smiles = \"CCCC1CCOCN1\"\n>>> F = Fragment(\"CCCCCC\") # hexane as a string\n>>> list(F.matches(smiles)) # smiles string (least efficient)\n[(0, 1, 2, 3, 4, 5)]\n\n>>> import rdkit\n>>> mol = rdkit.Chem.MolFromSmiles(smiles)\n>>> list(F.matches(mol))  # RDKit mol\n[(0, 1, 2, 3, 4, 5)]\n\n>>> mol_graph = Graph.from_molecule(mol)\n>>> list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient)\n[(0, 1, 2, 3, 4, 5)]\n```\n\nMatches ensure that the fragment string of matches is the same as\nthe fragment. This is different than standards SMARTS matching,\nand *prevents* rings from matching unclosed ring patterns:\n\n```\n>>> str(Fragment(\"C1CCCCC1\")) # cyclohexane\n'C1-C-C-C-C-C-1'\n\n>>> assert(str(Fragment(\"C1CCCCC1\")) != str(F)) # cyclohexane is not hexane\n>>> list(F.matches(\"C1CCCCC1\")) # Unlike SMARTS, no match!\n[]\n```\n\nEfficiently create multiple fragments by reusing a\nprecomputed graph:\n\n```\n>>> import rdkit\n>>>\n>>> mol = rdkit.Chem.MolFromSmiles(\"c1ccccc1OCCC\")\n>>> mol_graph = Graph.from_molecule(mol)\n>>>\n>>> f1 = Fragment(mol_graph, [0])\n>>> f2 = Fragment(mol_graph, [6,5,4])\n```\n\nFind matches to fragments:\n\n```\n>>> list(f1.matches(mol))\n[(0,), (1,), (2,), (3,), (4,), (5,)]\n\n>>> list(f2.matches(mol))\n[(6, 5, 4), (6, 5, 0)]\n```\n\nFragments know how to report if they are canonically the same as each other or strings.\n\n```\n>>> Fragment(\"CCO\") == Fragment(\"OCC\")\nTrue\n>>> Fragment(\"CCO\") == \"C-C-O\"\nTrue\n```\n\nNote, however, that strings are not converted to canonical form. Therefore,\n\n```\n>>> Fragment(\"CCO\") == \"CCO\"\nFalse\n```\n\nEnumerate and compute statistics on all the subgraphs in a molecule:\n\n```\n>>> from xenosite.fragment.net import SubGraphFragmentNetwork\n>>> N = SubGraphFragmentNetwork(\"CC1COC1\")\n>>> fragments = N.to_pandas()\n>>> list(fragments.index)\n['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C']\n>>> fragments[\"size\"].to_numpy()\narray([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])\n```\n\nBetter fragments can be enumerated by collapsing all atoms in a ring into a single node\nduring subgraph enumeration. \n\n```\n>>> from xenosite.fragment.net import RingFragmentNetwork\n>>> N = RingFragmentNetwork(\"CC1COC1\")\n>>> fragments = N.to_pandas()\n>>> list(fragments.index)\n['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1']\n>>> fragments[\"size\"].to_numpy()\narray([5, 1, 4])\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Library for molecule fragment operations.",
    "version": "0a11",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed23e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7",
                "md5": "9fb64f8dfdfbc8b20634901d138ff3d5",
                "sha256": "11586d5c31eece2b279be90bf8df3bc09bb87881420ce1ef8d51bb2d0f63bb89"
            },
            "downloads": -1,
            "filename": "xenosite-fragment-0a11.tar.gz",
            "has_sig": false,
            "md5_digest": "9fb64f8dfdfbc8b20634901d138ff3d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26793,
            "upload_time": "2023-07-23T04:33:59",
            "upload_time_iso_8601": "2023-07-23T04:33:59.893542Z",
            "url": "https://files.pythonhosted.org/packages/ed/23/e89872e8634153f304c667e8e999a62a9e15addac25e6bf19bb96b9ff6c7/xenosite-fragment-0a11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-23 04:33:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "xenosite-fragment"
}
        
Elapsed time: 0.79601s