lXtractor


NamelXtractor JSON
Version 0.1.6 PyPI version JSON
download
home_pageNone
SummaryFeature extraction library for sequences and structures
upload_time2024-08-03 17:28:19
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords bioinformatics data_mining feature_extracton structural_biology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # lXtractor

[![Coverage Status](https://coveralls.io/repos/github/edikedik/lXtractor/badge.svg?branch=master)](https://coveralls.io/github/edikedik/lXtractor?branch=master)
[![Documentation status](https://readthedocs.org/projects/lxtractor/badge/?version=latest)](https://lxtractor.readthedocs.io/en/latest/?badge=latest)
[![PyPi status](https://img.shields.io/pypi/v/lXtractor.svg)](https://pypi.org/project/lXtractor)
[![Python version](https://img.shields.io/pypi/pyversions/lXtractor.svg)](https://pypi.org/project/lXtractor)
[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)

<img src="./fig/lXt_diagram.png" alt="lXt_diagram" width="300"/>

## Introduction

`lXtractor` is a toolbox devoted to feature extraction from macromolecular
sequences and structures.
It's tailored towards creating shareable local data collections anchored to
a reference sequence-based object: a single sequence, MSA, or an HMM model.
Currently, it doesn't define any unique algorithms, aiming at simplicity and
transparency.
It simply provides a (hopefully) convenient interface simplifying mundane tasks,
such as fetching the data, extracting domains, mapping sequences, and computing
sequential and structural variables.
Sequences and structures anchored to a single reference object have a benefit
of interpretability in downstream applications, such as fitting interpretable
ML models.

## Installation

`lXtractor` requires python>=3.10 installed on a Unix system and is
installable via pip

```bash
pip install lXtractor
```

We encourage users to first create a virtual environment via `conda` or `mamba`.

## Usage

`lXtractor` is designed to be flexible and its usage is defined by the initial
hypothesis or a reference object that one wants to extrapolate towards the
existing sequences or structures.
Below, we'll provide a very abstract description of what this package is
intended for.

In creating data collections, one could define the following steps::

1. Assemble the data.
2. Map reference object to assembled entries' sequences.
3. Filter hits.
4. Define and calculate variables -- sequence or structure descriptors.
5. Save the data for later usage or modifications.

`lXtractor` defines objects and routines helpful throughout this process.
Namely, `PDB`, `SIFTS`, `AlphaFold`, `fetch_uniprot()`
can aid in the first step.
Then, `Alignment` and `PyHMMer` can facilitate step 2.
At the end of the step 2 one will get a collection of `Chain*`-type objects.
If working with sequence-only collections, these are going to be
`ChainSequence` objects.
For structure-only data, these are going to be ``ChainStructure`` containers,
embedding `ChainSequence` and `GenericStructure` objects.
Finally, dealing with mappings between canonical sequence associated with
a group of structures will result in ``Chain`` objects.

`ChainList` wraps `Chain*`-type objects into a list-like collection with
useful operations allowing to quickly filter and bulk-modify `Chain*`-type
objects.
Thus, filtering typically comes down to using ``ChainList.filter()`` method that
accepts a `Callable[Chain*, bool]` and returns a filtered `ChainList`.
One can save/load the collected objects using `ChainIO` and proceed
with the feature extraction.

`lXtractor` defines various sequence and structure variables.
Variable-related operations are handled by `GenericCalculator` and
`Manager` classes. The former defines the calculation strategy and how
the calculations are parallelized, while the latter handles the calculations
and aggregates the results into a pandas `DataFrame`.

As a result, one is left with a collection of `Chain*`-type objects and a
table with calculated variables. In addition, one can store the calculated
variables within the objects themselves, although we currently do not encourage
this practice.

`lXtractor` is in the experimental stage and under active development.
Thus, objects' interfaces may change.

For the time being, one can check the examples of
1. [finding sequence determinants](https://eboruta.readthedocs.io/en/latest/notebooks/sequence_determinants_tutorial.html)
of tyrosine and serine-threonine kinases and
2. [a protocol](https://github.com/edikedik/kinactive/blob/abae9c8a1fca0754d02e3f117dee210b587e666b/kinactive/db.py#L142)
to build a complete structural collection of protein kinase domains.

More examples are to come in the future, so stay tuned. If you know a good example to apply `lXtractor`, feel free to raise an issue or reach out [ivan.reveguk@gmail.com](ivan.reveguk@gmail.com).
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lXtractor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Ivan Reveguk <ivan.reveguk@gmail.com>",
    "keywords": "bioinformatics, data_mining, feature_extracton, structural_biology",
    "author": null,
    "author_email": "Ivan Reveguk <ivan.reveguk@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2c/19/3927e602bbd4c37222a62f93b287dc02ff36ad8336bba75f41ba387c91dd/lxtractor-0.1.6.tar.gz",
    "platform": null,
    "description": "# lXtractor\n\n[![Coverage Status](https://coveralls.io/repos/github/edikedik/lXtractor/badge.svg?branch=master)](https://coveralls.io/github/edikedik/lXtractor?branch=master)\n[![Documentation status](https://readthedocs.org/projects/lxtractor/badge/?version=latest)](https://lxtractor.readthedocs.io/en/latest/?badge=latest)\n[![PyPi status](https://img.shields.io/pypi/v/lXtractor.svg)](https://pypi.org/project/lXtractor)\n[![Python version](https://img.shields.io/pypi/pyversions/lXtractor.svg)](https://pypi.org/project/lXtractor)\n[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)\n\n<img src=\"./fig/lXt_diagram.png\" alt=\"lXt_diagram\" width=\"300\"/>\n\n## Introduction\n\n`lXtractor` is a toolbox devoted to feature extraction from macromolecular\nsequences and structures.\nIt's tailored towards creating shareable local data collections anchored to\na reference sequence-based object: a single sequence, MSA, or an HMM model.\nCurrently, it doesn't define any unique algorithms, aiming at simplicity and\ntransparency.\nIt simply provides a (hopefully) convenient interface simplifying mundane tasks,\nsuch as fetching the data, extracting domains, mapping sequences, and computing\nsequential and structural variables.\nSequences and structures anchored to a single reference object have a benefit\nof interpretability in downstream applications, such as fitting interpretable\nML models.\n\n## Installation\n\n`lXtractor` requires python>=3.10 installed on a Unix system and is\ninstallable via pip\n\n```bash\npip install lXtractor\n```\n\nWe encourage users to first create a virtual environment via `conda` or `mamba`.\n\n## Usage\n\n`lXtractor` is designed to be flexible and its usage is defined by the initial\nhypothesis or a reference object that one wants to extrapolate towards the\nexisting sequences or structures.\nBelow, we'll provide a very abstract description of what this package is\nintended for.\n\nIn creating data collections, one could define the following steps::\n\n1. Assemble the data.\n2. Map reference object to assembled entries' sequences.\n3. Filter hits.\n4. Define and calculate variables -- sequence or structure descriptors.\n5. Save the data for later usage or modifications.\n\n`lXtractor` defines objects and routines helpful throughout this process.\nNamely, `PDB`, `SIFTS`, `AlphaFold`, `fetch_uniprot()`\ncan aid in the first step.\nThen, `Alignment` and `PyHMMer` can facilitate step 2.\nAt the end of the step 2 one will get a collection of `Chain*`-type objects.\nIf working with sequence-only collections, these are going to be\n`ChainSequence` objects.\nFor structure-only data, these are going to be ``ChainStructure`` containers,\nembedding `ChainSequence` and `GenericStructure` objects.\nFinally, dealing with mappings between canonical sequence associated with\na group of structures will result in ``Chain`` objects.\n\n`ChainList` wraps `Chain*`-type objects into a list-like collection with\nuseful operations allowing to quickly filter and bulk-modify `Chain*`-type\nobjects.\nThus, filtering typically comes down to using ``ChainList.filter()`` method that\naccepts a `Callable[Chain*, bool]` and returns a filtered `ChainList`.\nOne can save/load the collected objects using `ChainIO` and proceed\nwith the feature extraction.\n\n`lXtractor` defines various sequence and structure variables.\nVariable-related operations are handled by `GenericCalculator` and\n`Manager` classes. The former defines the calculation strategy and how\nthe calculations are parallelized, while the latter handles the calculations\nand aggregates the results into a pandas `DataFrame`.\n\nAs a result, one is left with a collection of `Chain*`-type objects and a\ntable with calculated variables. In addition, one can store the calculated\nvariables within the objects themselves, although we currently do not encourage\nthis practice.\n\n`lXtractor` is in the experimental stage and under active development.\nThus, objects' interfaces may change.\n\nFor the time being, one can check the examples of\n1. [finding sequence determinants](https://eboruta.readthedocs.io/en/latest/notebooks/sequence_determinants_tutorial.html)\nof tyrosine and serine-threonine kinases and\n2. [a protocol](https://github.com/edikedik/kinactive/blob/abae9c8a1fca0754d02e3f117dee210b587e666b/kinactive/db.py#L142)\nto build a complete structural collection of protein kinase domains.\n\nMore examples are to come in the future, so stay tuned. If you know a good example to apply `lXtractor`, feel free to raise an issue or reach out [ivan.reveguk@gmail.com](ivan.reveguk@gmail.com).",
    "bugtrack_url": null,
    "license": null,
    "summary": "Feature extraction library for sequences and structures",
    "version": "0.1.6",
    "project_urls": {
        "Bug Tracker": "https://github.com//edikedik/lXtractor/issues",
        "Source code": "https://github.com/edikedik/lXtractor"
    },
    "split_keywords": [
        "bioinformatics",
        " data_mining",
        " feature_extracton",
        " structural_biology"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f84f8f3fc8115fb3bdf861e4c86011c38b67b77c485a19e1e7ba99fb3e2cfc82",
                "md5": "835699ff432dc5c332bf9fa829c78b46",
                "sha256": "c89cc44d54ba8fd86dd4cd4e0daec6b3d98cecf4ba393b0eda2a95c7cca1fa88"
            },
            "downloads": -1,
            "filename": "lxtractor-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "835699ff432dc5c332bf9fa829c78b46",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 214931,
            "upload_time": "2024-08-03T17:28:15",
            "upload_time_iso_8601": "2024-08-03T17:28:15.487614Z",
            "url": "https://files.pythonhosted.org/packages/f8/4f/8f3fc8115fb3bdf861e4c86011c38b67b77c485a19e1e7ba99fb3e2cfc82/lxtractor-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c193927e602bbd4c37222a62f93b287dc02ff36ad8336bba75f41ba387c91dd",
                "md5": "52a685faf5866f5947c408eb7f835df1",
                "sha256": "397b162debc11930f123c9ee48eebed9d9a9de17bb592f66dcabb3912411395e"
            },
            "downloads": -1,
            "filename": "lxtractor-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "52a685faf5866f5947c408eb7f835df1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 187108,
            "upload_time": "2024-08-03T17:28:19",
            "upload_time_iso_8601": "2024-08-03T17:28:19.471577Z",
            "url": "https://files.pythonhosted.org/packages/2c/19/3927e602bbd4c37222a62f93b287dc02ff36ad8336bba75f41ba387c91dd/lxtractor-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-03 17:28:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "edikedik",
    "github_project": "lXtractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lxtractor"
}
        
Elapsed time: 0.34522s