strnaming


Namestrnaming JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://fdstools.nl
SummarySTRNaming STR Sequence Nomenclature
upload_time2024-01-11 17:15:30
maintainer
docs_urlNone
authorJerry Hoogenboom
requires_python>=3.5
licenseLGPLv3+
keywords bioinformatics forensics ngs mps dna sequencing str nomenclature
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            STRNaming
=========
STRNaming is an algorithm for generating simple, informative names for Short
Tandem Repeat (STR) sequences, such as those used in the field of forensic
genetics, in a standardised and automated manner.


Requirements
------------
STRNaming requires Python version 3.5 or later.


Installation
------------
The recommended way to install STRNaming is by using the `pip` package
installer. If you have `pip` installed, you can easily install STRNaming by
running the following command:

    pip install strnaming

Alternatively, STRNaming can be installed by running:

    python setup.py install


Usage
-----
This version of STRNaming comes with a command-line interface which allows
generating allele names for sequence data using the ranges and sequence
orientation of the "Flanking Region Report" of the Universal Analysis
Software for the ForenSeq DNA Signature Prep Kit (Verogen). A user-friendly
web version of STRNaming that allows to set your own sequence ranges can be
found at the website (https://fdstools.nl/strnaming). For more general
command-line usage it is currently recommended to install FDSTools and use
the `fdstools seqconvert` tool to access STRNaming. Please refer to
https://fdstools.nl for more information.

### Command-line interface
The command-line help can be accessed by running `strnaming --help`. In short,
an STRNaming command looks like this:

    strnaming name-sequences --ranges uas-frr inputfile.txt outputfile.txt

The input file should have a marker name and a sequence on each line, separated
by whitespace (i.e., tabs or spaces).

If no output file is given, the output is written to `stdout`, which normally
shows up in your command line window. If no input file is given either,
STRNaming will read input from `stdin`, allowing you to type the input one line
at a time.

### Programming interface
It is **not recommended** to `import` and use parts of this version of
STRNaming directly from other Python code, because the internal API is not
stable yet. Instead, use the `subprocess` module if you want to use STRNaming
in your Python project at this time. As an added benefit, it will run in a
concurrent process, meaning your code does not (necessarily) have to wait for
STRNaming to finish.

To use STRNaming in other software projects, regardless of the programming
language, it can be run as a separate subprocess. Write a marker name, a
whitespace character, the DNA sequence, and a newline character (`\n`) to its
standard input stream (`stdin`), and STRNaming will write the same marker name,
a tab character, the allele name and a newline character to its standard output
stream (`stdout`). Any errors are reported on the standard error stream
(`stderr`) and will cause the STRNaming process to terminate. By specifying the
`--unbuffered` command-line switch, STRNaming will immediately flush its output
stream after every line of output.

A more capable command-line interface to better support programmatic access to
STRNaming will be introduced in a future release.

### Offline use
STRNaming will automatically download and cache portions of reference sequence
from the Ensembl REST API (http://rest.ensembl.org). If you are running
STRNaming on a system without internet access, and you need a piece of
reference sequence that was not bundled with the STRNaming package, a message
will be displayed to manually store the reference sequence in a specific
location. To this end, run the following command (on a system with internet
access) to download the sequence:

    strnaming refseq-cache chr2:1489653..1489689

Upon success, the location of the downloaded cache files will be displayed.
These are the files to be copied to the offline system for STRNaming to work.


Release Notes
-------------
### Version 1.2.0 (11 January 2024)
Naming of some loci has been updated as a result of bug fixes and improvements
to the algorithm. Most notably, reference sequence analysis has been redesigned
in such a way that it is no longer affected by the range of reference sequence
analysed at once.
* Updated CE allele numbering of D6S474 (-1 unit).
* Maximum resource usage can now be controlled by setting environment variables
  STRNAMING_MAX_SECONDS (float, default 30.0),
  STRNAMING_MAX_SECONDS_REFSEQ (float, default 300.0) and
  STRNAMING_MAX_SCAFFOLDS (int, default 5000000).

### Version 1.1.4 (7 February 2023)
* Repeat stretches that fall completely in the prefix or suffix are now ignored.
* Sequences that follow the same repeat pattern as the reference sequence are
  now named much more quickly while consistently using the same structure.
* Added capability to load reference structures from many locations on one
  chromosome in a single pass.

### Version 1.1.3 (18 August 2022)
* Fixed an issue that caused STRNaming to sometimes favour a longer name with
  the same score.

### Version 1.1.2 (10 May 2022)
* Updated CE allele numbering of DYS612 (+6 units).
* Added reference structure for SE33.
* Updated hardcoded reference length adjustment table to suppress second
  structure 5' of DYS522.
* Added double-click-to-toggle-text-alignment feature to HTML output.
* Fixed mtDNA reference sequence download URL.

### Version 1.1.1 (19 July 2021)
* Fixed an issue with CE allele numbering that occurred for reporting ranges
  that started or ended halfway into a structure with a hardcoded reference
  length adjustment.
* Updated table of hardcoded reference length adjustments to include more loci.

### Version 1.1.0 (15 July 2021)
Naming of some loci has been updated as a result of bug fixes and improvements
to the algorithm. Scoring criteria have been updated to minimize unintended
side-effects of these changes.
* Fixed a major issue with HPRTB allele numbering: previously, the CE allele
  number calculated for a given sequence was one higher than it should be.
* Allele names are now permitted to contain repeats of a unit that exceeds the
  dominant unit length of a locus. This change greatly improves naming of some
  complex Y-STRs.
* Short repeat stretches that only partially overlap with a significant repeat
  of a longer unit are no longer discarded. This change may introduce short
  repeats adjacent to longer repeats of a longer unit, which were previously
  'missed' by STRNaming.
* Fixed bug that disallowed making interruptions which could be filled exactly
  with an 'orphan' repeat, thereby forcing the use of a compatible 'anchor'.
* Reference sequence analysis now guarantees that all repeat units in the
  final result are actually repeated.
* Reference repeat units only found outside the reported range are now included
  in the list of preferred units when generating allele names. This change
  improves naming stability when a significant part of the reference STR
  structure lies outside the reported range.
* STRNaming will no longer consider names that include an interruption of which
  the sequence is equal to an adjacent repeat unit (e.g., CCTA[2]CCTA[1]TCTA[2]).

New features:
* The built-in reference sequence cache was introduced, along with the new
  mandatory ACTION command-line argument.
* Colored output in HTML format is now available by using the --html
  command-line argument.
* Reference sequence analysis results of almost the entire human genome have
  been embedded into the package.

### Version 1.0.0 (21 December 2020)
Initial release of STRNaming.

            

Raw data

            {
    "_id": null,
    "home_page": "https://fdstools.nl",
    "name": "strnaming",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "bioinformatics forensics NGS MPS DNA sequencing STR nomenclature",
    "author": "Jerry Hoogenboom",
    "author_email": "jerryhoogenboom@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/03/91/f1e112142d1742c92e62b26a9dabb3c4e206cac33491f560c1ec6f00a8b6/strnaming-1.2.0.tar.gz",
    "platform": null,
    "description": "STRNaming\r\n=========\r\nSTRNaming is an algorithm for generating simple, informative names for Short\r\nTandem Repeat (STR) sequences, such as those used in the field of forensic\r\ngenetics, in a standardised and automated manner.\r\n\r\n\r\nRequirements\r\n------------\r\nSTRNaming requires Python version 3.5 or later.\r\n\r\n\r\nInstallation\r\n------------\r\nThe recommended way to install STRNaming is by using the `pip` package\r\ninstaller. If you have `pip` installed, you can easily install STRNaming by\r\nrunning the following command:\r\n\r\n    pip install strnaming\r\n\r\nAlternatively, STRNaming can be installed by running:\r\n\r\n    python setup.py install\r\n\r\n\r\nUsage\r\n-----\r\nThis version of STRNaming comes with a command-line interface which allows\r\ngenerating allele names for sequence data using the ranges and sequence\r\norientation of the \"Flanking Region Report\" of the Universal Analysis\r\nSoftware for the ForenSeq DNA Signature Prep Kit (Verogen). A user-friendly\r\nweb version of STRNaming that allows to set your own sequence ranges can be\r\nfound at the website (https://fdstools.nl/strnaming). For more general\r\ncommand-line usage it is currently recommended to install FDSTools and use\r\nthe `fdstools seqconvert` tool to access STRNaming. Please refer to\r\nhttps://fdstools.nl for more information.\r\n\r\n### Command-line interface\r\nThe command-line help can be accessed by running `strnaming --help`. In short,\r\nan STRNaming command looks like this:\r\n\r\n    strnaming name-sequences --ranges uas-frr inputfile.txt outputfile.txt\r\n\r\nThe input file should have a marker name and a sequence on each line, separated\r\nby whitespace (i.e., tabs or spaces).\r\n\r\nIf no output file is given, the output is written to `stdout`, which normally\r\nshows up in your command line window. If no input file is given either,\r\nSTRNaming will read input from `stdin`, allowing you to type the input one line\r\nat a time.\r\n\r\n### Programming interface\r\nIt is **not recommended** to `import` and use parts of this version of\r\nSTRNaming directly from other Python code, because the internal API is not\r\nstable yet. Instead, use the `subprocess` module if you want to use STRNaming\r\nin your Python project at this time. As an added benefit, it will run in a\r\nconcurrent process, meaning your code does not (necessarily) have to wait for\r\nSTRNaming to finish.\r\n\r\nTo use STRNaming in other software projects, regardless of the programming\r\nlanguage, it can be run as a separate subprocess. Write a marker name, a\r\nwhitespace character, the DNA sequence, and a newline character (`\\n`) to its\r\nstandard input stream (`stdin`), and STRNaming will write the same marker name,\r\na tab character, the allele name and a newline character to its standard output\r\nstream (`stdout`). Any errors are reported on the standard error stream\r\n(`stderr`) and will cause the STRNaming process to terminate. By specifying the\r\n`--unbuffered` command-line switch, STRNaming will immediately flush its output\r\nstream after every line of output.\r\n\r\nA more capable command-line interface to better support programmatic access to\r\nSTRNaming will be introduced in a future release.\r\n\r\n### Offline use\r\nSTRNaming will automatically download and cache portions of reference sequence\r\nfrom the Ensembl REST API (http://rest.ensembl.org). If you are running\r\nSTRNaming on a system without internet access, and you need a piece of\r\nreference sequence that was not bundled with the STRNaming package, a message\r\nwill be displayed to manually store the reference sequence in a specific\r\nlocation. To this end, run the following command (on a system with internet\r\naccess) to download the sequence:\r\n\r\n    strnaming refseq-cache chr2:1489653..1489689\r\n\r\nUpon success, the location of the downloaded cache files will be displayed.\r\nThese are the files to be copied to the offline system for STRNaming to work.\r\n\r\n\r\nRelease Notes\r\n-------------\r\n### Version 1.2.0 (11 January 2024)\r\nNaming of some loci has been updated as a result of bug fixes and improvements\r\nto the algorithm. Most notably, reference sequence analysis has been redesigned\r\nin such a way that it is no longer affected by the range of reference sequence\r\nanalysed at once.\r\n* Updated CE allele numbering of D6S474 (-1 unit).\r\n* Maximum resource usage can now be controlled by setting environment variables\r\n  STRNAMING_MAX_SECONDS (float, default 30.0),\r\n  STRNAMING_MAX_SECONDS_REFSEQ (float, default 300.0) and\r\n  STRNAMING_MAX_SCAFFOLDS (int, default 5000000).\r\n\r\n### Version 1.1.4 (7 February 2023)\r\n* Repeat stretches that fall completely in the prefix or suffix are now ignored.\r\n* Sequences that follow the same repeat pattern as the reference sequence are\r\n  now named much more quickly while consistently using the same structure.\r\n* Added capability to load reference structures from many locations on one\r\n  chromosome in a single pass.\r\n\r\n### Version 1.1.3 (18 August 2022)\r\n* Fixed an issue that caused STRNaming to sometimes favour a longer name with\r\n  the same score.\r\n\r\n### Version 1.1.2 (10 May 2022)\r\n* Updated CE allele numbering of DYS612 (+6 units).\r\n* Added reference structure for SE33.\r\n* Updated hardcoded reference length adjustment table to suppress second\r\n  structure 5' of DYS522.\r\n* Added double-click-to-toggle-text-alignment feature to HTML output.\r\n* Fixed mtDNA reference sequence download URL.\r\n\r\n### Version 1.1.1 (19 July 2021)\r\n* Fixed an issue with CE allele numbering that occurred for reporting ranges\r\n  that started or ended halfway into a structure with a hardcoded reference\r\n  length adjustment.\r\n* Updated table of hardcoded reference length adjustments to include more loci.\r\n\r\n### Version 1.1.0 (15 July 2021)\r\nNaming of some loci has been updated as a result of bug fixes and improvements\r\nto the algorithm. Scoring criteria have been updated to minimize unintended\r\nside-effects of these changes.\r\n* Fixed a major issue with HPRTB allele numbering: previously, the CE allele\r\n  number calculated for a given sequence was one higher than it should be.\r\n* Allele names are now permitted to contain repeats of a unit that exceeds the\r\n  dominant unit length of a locus. This change greatly improves naming of some\r\n  complex Y-STRs.\r\n* Short repeat stretches that only partially overlap with a significant repeat\r\n  of a longer unit are no longer discarded. This change may introduce short\r\n  repeats adjacent to longer repeats of a longer unit, which were previously\r\n  'missed' by STRNaming.\r\n* Fixed bug that disallowed making interruptions which could be filled exactly\r\n  with an 'orphan' repeat, thereby forcing the use of a compatible 'anchor'.\r\n* Reference sequence analysis now guarantees that all repeat units in the\r\n  final result are actually repeated.\r\n* Reference repeat units only found outside the reported range are now included\r\n  in the list of preferred units when generating allele names. This change\r\n  improves naming stability when a significant part of the reference STR\r\n  structure lies outside the reported range.\r\n* STRNaming will no longer consider names that include an interruption of which\r\n  the sequence is equal to an adjacent repeat unit (e.g., CCTA[2]CCTA[1]TCTA[2]).\r\n\r\nNew features:\r\n* The built-in reference sequence cache was introduced, along with the new\r\n  mandatory ACTION command-line argument.\r\n* Colored output in HTML format is now available by using the --html\r\n  command-line argument.\r\n* Reference sequence analysis results of almost the entire human genome have\r\n  been embedded into the package.\r\n\r\n### Version 1.0.0 (21 December 2020)\r\nInitial release of STRNaming.\r\n",
    "bugtrack_url": null,
    "license": "LGPLv3+",
    "summary": "STRNaming STR Sequence Nomenclature",
    "version": "1.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Jerrythafast/STRNaming/issues",
        "Homepage": "https://fdstools.nl",
        "Source Code": "https://github.com/Jerrythafast/STRNaming"
    },
    "split_keywords": [
        "bioinformatics",
        "forensics",
        "ngs",
        "mps",
        "dna",
        "sequencing",
        "str",
        "nomenclature"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7b03fddd5b9c425c5f6f60b69b8f0b29d9f5e78344c5283dafec963c465883db",
                "md5": "c5f9c36bc6acf7563a329c7ed7feb452",
                "sha256": "b2fcdbd50376e9e8a3db8a4cfcdbf4883614b8d81cd0e5b5e5c815e7464c8a2b"
            },
            "downloads": -1,
            "filename": "strnaming-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5f9c36bc6acf7563a329c7ed7feb452",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 10229246,
            "upload_time": "2024-01-11T17:15:27",
            "upload_time_iso_8601": "2024-01-11T17:15:27.104132Z",
            "url": "https://files.pythonhosted.org/packages/7b/03/fddd5b9c425c5f6f60b69b8f0b29d9f5e78344c5283dafec963c465883db/strnaming-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0391f1e112142d1742c92e62b26a9dabb3c4e206cac33491f560c1ec6f00a8b6",
                "md5": "bd48fd1f2ddd50ca53b2c6f57b3d3275",
                "sha256": "130f9af2acc59652ea415a61e9eb7af15230b30a9862f1173d188a87fec2cde1"
            },
            "downloads": -1,
            "filename": "strnaming-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bd48fd1f2ddd50ca53b2c6f57b3d3275",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 10221416,
            "upload_time": "2024-01-11T17:15:30",
            "upload_time_iso_8601": "2024-01-11T17:15:30.333636Z",
            "url": "https://files.pythonhosted.org/packages/03/91/f1e112142d1742c92e62b26a9dabb3c4e206cac33491f560c1ec6f00a8b6/strnaming-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-11 17:15:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Jerrythafast",
    "github_project": "STRNaming",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "strnaming"
}
        
Elapsed time: 0.16939s