group-decomposition


Namegroup-decomposition JSON
Version 0.6.0 PyPI version JSON
download
home_page
SummaryA plugin for extracting data from .sum files and manipuating them
upload_time2024-01-19 16:36:21
maintainer
docs_urlNone
author
requires_python>=3.10
license
keywords qtaim functional groups
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Identifying fragments in molecule SMILES codes

Python functions to identify fragments in a molecule(or set of molecules) based on their SMILES codes, or .mol(and/or cml files). The fragments are chemically meaningful. Fragments identified include rings, linkers, side chains of molecules, and the functional groups(as defined by Ertl - heteroatoms and double bonds), and alkyl groups that compose them. There are two main functionalities currently, at the single molecule level, and at the batch level. At the single molecular level, the molecule is broken up into the fragments, and each fragment retains connectivity. At the batch level, for multiple molecules, connectivity is removed, and the unique fragments are identified, and their occurences counted. For example, this would identify that there are N methyl groups in the set of molecules.


The code is implemented primarily through RDKit, using a mix of the rdScaffoldNetwork module, and SMARTS pattern matching. rdScaffoldNetwork is used to identify the ring systems due to its flexibility in bond breaking rules, and that rdScaffoldNetwork will not break fused rings. The molecule is fragmented using the rdkit FragmentOnBonds functionality, which provides the option to label the dummy atoms produced with labels that can indicate connectivity. SMARTS matching is used to break the molecule further into functional groups and alkyl groups  by breaking single bonds between non-ring sp3 carbons and ring atoms or heteroatoms.


# Authors
Kevin Lefrancois-Gagnon
Robert C. Mawhinney

# Installation prior to distribution
```
pip install git+https://github.com/kmlefran/group_decomposition
```


# Usage Examples

## Identifying fragments in a single molecule

Passing any SMILES to identify_connected_fragments will return the identified fragments for that molecule in a pandas data frame. Fragments are included with connectivity information as dummy labels. That is, where the bonds were broken in the molecule to identify the fragment, there is a placeholder atom (*). This atom has a label in will appear in the smiles code as \[n\*\]  for integer n. The integer n will match with another fragment that will also have \[n\*\] in the smiles code. Each broken bond is assigned a different n, starting from 1, up to number broken.

```
identify_connected_fragments(smile='C1C(C)CCCC1')
```

The above output will include all fragments, even for example, multiple F atoms as \[1\*\]-F and \[2\*\]-F. One can remove connectivity information and count the number of unique fragments with the below code. fragFrame here is a frame returned by identify_connected_fragments. dropAttachments is a Boolean, defaulting to False. While False, placeholder atoms will remain in all fragments with more than one atom. This would, however make it so that similar fragments will not match if they have a difference in connectivity. (for example, ortho and para substituted aromatic rings would not match). If you would like such cases to match, set dropAttachments=True to do so.

The output of this below code is a similar data frame to identify_connected_fragments, but with a column 'count' for number of times each unique fragment occurs, and the SMILES lack connectivity information

```
count_uniques(fragFrame,dropAttachements)
```

## Identifying fragments in a set of molecules

If you have a set of molecules, and wish to identify unique fragments in the set, and total the number of times each fragment occurs, one can use the below code. dropAttachments is defined as above, and listOfSmiles is exactly as it sounds, a list with each element containing the SMILES of a molecule, e.g. ['CC', 'CCF']. The output is similar to count_uniques, but with rows for all fragments in a set of molecules, not just one.

```
count_groups_in_set(listOfSmiles,dropAttachments)
```


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "group-decomposition",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "qtaim,functional groups",
    "author": "",
    "author_email": "Kevin Lefrancois-Gagnon <kgagnon@lakeheadu.ca>, Robert Mawhinney <mawhinn@lakeheadu.ca>",
    "download_url": "https://files.pythonhosted.org/packages/75/0d/e5328f29f3240bfc18347f9affb4ea719b8efd8cd91370b798f18850656b/group_decomposition-0.6.0.tar.gz",
    "platform": null,
    "description": "# Identifying fragments in molecule SMILES codes\n\nPython functions to identify fragments in a molecule(or set of molecules) based on their SMILES codes, or .mol(and/or cml files). The fragments are chemically meaningful. Fragments identified include rings, linkers, side chains of molecules, and the functional groups(as defined by Ertl - heteroatoms and double bonds), and alkyl groups that compose them. There are two main functionalities currently, at the single molecule level, and at the batch level. At the single molecular level, the molecule is broken up into the fragments, and each fragment retains connectivity. At the batch level, for multiple molecules, connectivity is removed, and the unique fragments are identified, and their occurences counted. For example, this would identify that there are N methyl groups in the set of molecules.\n\n\nThe code is implemented primarily through RDKit, using a mix of the rdScaffoldNetwork module, and SMARTS pattern matching. rdScaffoldNetwork is used to identify the ring systems due to its flexibility in bond breaking rules, and that rdScaffoldNetwork will not break fused rings. The molecule is fragmented using the rdkit FragmentOnBonds functionality, which provides the option to label the dummy atoms produced with labels that can indicate connectivity. SMARTS matching is used to break the molecule further into functional groups and alkyl groups  by breaking single bonds between non-ring sp3 carbons and ring atoms or heteroatoms.\n\n\n# Authors\nKevin Lefrancois-Gagnon\nRobert C. Mawhinney\n\n# Installation prior to distribution\n```\npip install git+https://github.com/kmlefran/group_decomposition\n```\n\n\n# Usage Examples\n\n## Identifying fragments in a single molecule\n\nPassing any SMILES to identify_connected_fragments will return the identified fragments for that molecule in a pandas data frame. Fragments are included with connectivity information as dummy labels. That is, where the bonds were broken in the molecule to identify the fragment, there is a placeholder atom (*). This atom has a label in will appear in the smiles code as \\[n\\*\\]  for integer n. The integer n will match with another fragment that will also have \\[n\\*\\] in the smiles code. Each broken bond is assigned a different n, starting from 1, up to number broken.\n\n```\nidentify_connected_fragments(smile='C1C(C)CCCC1')\n```\n\nThe above output will include all fragments, even for example, multiple F atoms as \\[1\\*\\]-F and \\[2\\*\\]-F. One can remove connectivity information and count the number of unique fragments with the below code. fragFrame here is a frame returned by identify_connected_fragments. dropAttachments is a Boolean, defaulting to False. While False, placeholder atoms will remain in all fragments with more than one atom. This would, however make it so that similar fragments will not match if they have a difference in connectivity. (for example, ortho and para substituted aromatic rings would not match). If you would like such cases to match, set dropAttachments=True to do so.\n\nThe output of this below code is a similar data frame to identify_connected_fragments, but with a column 'count' for number of times each unique fragment occurs, and the SMILES lack connectivity information\n\n```\ncount_uniques(fragFrame,dropAttachements)\n```\n\n## Identifying fragments in a set of molecules\n\nIf you have a set of molecules, and wish to identify unique fragments in the set, and total the number of times each fragment occurs, one can use the below code. dropAttachments is defined as above, and listOfSmiles is exactly as it sounds, a list with each element containing the SMILES of a molecule, e.g. ['CC', 'CCF']. The output is similar to count_uniques, but with rows for all fragments in a set of molecules, not just one.\n\n```\ncount_groups_in_set(listOfSmiles,dropAttachments)\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A plugin for extracting data from .sum files and manipuating them",
    "version": "0.6.0",
    "project_urls": {
        "Source": "https://github.com/kmlefran/group_decomposition"
    },
    "split_keywords": [
        "qtaim",
        "functional groups"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5044e2ae12a57d456360c80744bf82ea690ebc26f66710b2e33e78a4125dc799",
                "md5": "14b1f6b8364923500b9df55f76b8331c",
                "sha256": "4cb8e114880fd2feed8550d080a53b2b62bd5b7288319b50e43bbb6c55518984"
            },
            "downloads": -1,
            "filename": "group_decomposition-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "14b1f6b8364923500b9df55f76b8331c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 27408,
            "upload_time": "2024-01-19T16:36:20",
            "upload_time_iso_8601": "2024-01-19T16:36:20.299032Z",
            "url": "https://files.pythonhosted.org/packages/50/44/e2ae12a57d456360c80744bf82ea690ebc26f66710b2e33e78a4125dc799/group_decomposition-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "750de5328f29f3240bfc18347f9affb4ea719b8efd8cd91370b798f18850656b",
                "md5": "359bd3ae9785dc0ee82ee6467c1c9eb0",
                "sha256": "9079ba951cd4f1e953b5b60947aabe9833f0bff97e965bf44854a6f6100b9e42"
            },
            "downloads": -1,
            "filename": "group_decomposition-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "359bd3ae9785dc0ee82ee6467c1c9eb0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 27397,
            "upload_time": "2024-01-19T16:36:21",
            "upload_time_iso_8601": "2024-01-19T16:36:21.257716Z",
            "url": "https://files.pythonhosted.org/packages/75/0d/e5328f29f3240bfc18347f9affb4ea719b8efd8cd91370b798f18850656b/group_decomposition-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-19 16:36:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kmlefran",
    "github_project": "group_decomposition",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "group-decomposition"
}
        
Elapsed time: 0.96686s