discovery-transition-ds


Namediscovery-transition-ds JSON
Version 4.18.14 PyPI version JSON
download
home_pagehttps://github.com/gigas64/discovery-transition-ds
SummaryData Science to production accelerator
upload_time2023-07-03 15:30:54
maintainer
docs_urlNone
authorGigas64
requires_python>=3.7
licenseBSD
keywords wrangling ml visualisation dictionary discovery productize classification feature engineering cleansing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Project Hadron Data Science Tools and Synthetic Feature Builder
###############################################################

.. class:: no-web no-pdf

.. contents:: Table of Contents

Filling the Gap - Project Hadron
================================
Project Hadron has been built to bridge the gap between data scientists and data engineers. More specifically between
machine learning business outcomes and the final product.  It translates the work of data scientists into meaningful,
production ready solutions that can be easily managed by product engineers.

Project Hadron is a core set of abstractions that are the foundation of the three key elements that represent data
science, those being: (1) feature engineering, (2) the construction of synthetic data with simulators, and generators
(3) and statistics and machine learning algorithms for discovery and creating models. Project Hadron uniquely sees
data as ‘all the same’ (lazyprogrammer (2020) https://lazyprogrammer.me/all-data-is-the-same/) , by which we mean
its origin, shape and size stay independent throughout the disciplines so its content, form and structure can be
removed as a factor in the design and implementation of the components built.

Project Hadron has been designed to place data scientists in the familiar environment of machine learning and
statistical tools, extracting their ideas and translating them automagicially into production ready solutions
familiar to data engineers and Subject Matter Experts (SME’s).

Project Hadron provides a clear separation of concerns, whilst maintaining the original intentions of the data
scientist, that can be passed to a production team. It offers trust between the data scientists teams and product
teams. It brings with it transparency and traceability, dealing with bias, fairness, and knowledge. The resulting
outcome provides the product engineers with adaptability, robustness, and reuse; fitting seamlessly into a
microservices solution that can be language agnostic.

Project Hadron is designed using Microservices. Microservices - also known as the microservice architecture - is an
architectural pattern that structures an application as a collection of component services that are:

* Highly maintainable and testable
* Loosely coupled
* Independently deployable
* Highly reusable
* Resilient
* Technically independent

Component services are built for business capabilities and each service performs a single function. Because they are
independently run, each service can be updated, deployed, and scaled to meet demand for specific functions of an
application. Project Hadron microservices enable the rapid, frequent and reliable delivery of large, complex
applications. It also enables an organization to evolve its data science stack and experiment with innovative ideas.

At the heart of Project Hadron is a multi-tenant, NoSQL, singleton, in memory data store that has minimal code and
functionality and has been custom built specifically for Hadron tasks in  mind. Abstracted from this is the component
store which allows us to build a reusable set of methods that define each tenanted component that sits separately
from the store itself. In addition, a dynamic key value class provides labeling so that each tenant is not tied to
a fixed set of reference values unless by specificity. Each of the classes, the data store, the component property
manager, and the key value pairs that make up the component are all independent, giving complete flexibility and
minimum code footprint to the build process of new components.

This is what gives us the Domain Contract for each tennant which sits at the heart of what makes the contracts
reusable, translatable, transferable and brings the data scientist closer to the production engineer along with
building a production ready component solution.

Main features
-------------

* Data Preparation
* Feature Selection
* Feature Engineering
* Feature Cataloguing
* Augmented Knowledge
* Synthetic Feature Build

Feature transformers
--------------------

Project Hadron is a Python library with multiple transformers to engineer and select features to use
across a synthetic build, statistics and machine learning.

* Missing data imputation
* Categorical encoding
* Variable Discretisation
* Outlier capping or removal
* Numerical transformation
* Redundant feature removal
* Synthetic variable creation
* Synthetic multivariate
* Synthetic model distributions
* Datetime features
* Time series

Project Hadron allows one to present optimal parameters associated with each transformer, allowing
different engineering procedures to be applied to different variables and feature subsets.

Background
----------
Born out of the frustration of time constraints and the inability to show business value
within a business expectation, this project aims to provide a set of tools to quickly build production ready
data science disciplines within a component based solution demonstrating coupling and cohesion between each
disipline, providing a separation of concerns between components.

It also aims to improve the communication outputs needed by ML delivery to talk to Pre-Sales, Stakholders,
Business SME's, Data SME's product coders and tooling engineers while still remaining within familiar code
paradigms.

Getting Started
===============

The ``discovery-transition-ds`` package is a set of python components that are focussed on Data Science. They
are a concrete implementation of the Project Hadron abstract core. It is build to be very light weight
in terms of package dependencies requiring nothing beyond what would be found in an basic Data Science environment.
Its designed to be used easily within multiple python based interfaces such as Jupyter, IDE or terminal python.

Package Installation
--------------------

The best way to install the component packages is directly from the Python Package Index repository using pip.

The component package is ``discovery-transition-ds`` and pip installed with:

.. code-block:: bash

    python -m pip install discovery-transition-ds

if you want to upgrade your current version then using pip install upgrade with:

.. code-block:: bash

    python -m pip install -U discovery-transition-ds

This will also install or update dependent third party packages. The dependencies are
limited to python and related Data Science tooling such as pandas, numpy, scipy,
scikit-learn and visual packages matplotlib and seaborn, and thus have a limited
footprint and non-disruptive in a machine learning environment.

Get the Source Code
-------------------

``discovery-transition-ds`` is actively developed on GitHub, where the code is
`always available <https://github.com/project-hadron/discovery-transition-ds>`_.

You can clone the public repository with:

.. code-block:: bash

    $ git clone git@github.com:project-hadron/discovery-transition-ds.git

Once you have a copy of the source, you can embed it in your own Python
package, or install it into your site-packages easily running:

.. code-block:: bash

    $ cd discovery-transition-ds
    $ python -m pip install .

Release Process and Rules
-------------------------

Versions to be released after ``3.5.27``, the following rules will govern
and describe how the ``discovery-transition-ds`` produces a new release.

To find the current version of ``discovery-transition-ds``, from your
terminal run:

.. code-block:: bash

    $ python -c "import ds_discovery; print(ds_discovery.__version__)"

Major Releases
**************

A major release will include breaking changes. When it is versioned, it will
be versioned as ``vX.0.0``. For example, if the previous release was
``v10.2.7`` the next version will be ``v11.0.0``.

Breaking changes are changes that break backwards compatibility with prior
versions. If the project were to change an existing methods signature or
alter a class or method name, that would only happen in a Major release.
The majority of changes to the dependant core abstraction will result in a
major release. Major releases may also include miscellaneous bug fixes that
have significant implications.

Project Hadron is committed to providing a good user experience
and as such, committed to preserving backwards compatibility as much as possible.
Major releases will be infrequent and will need strong justifications before they
are considered.

Minor Releases
**************

A minor release will include addition methods, or noticeable changes to
code in a backward-compatable manner and miscellaneous bug fixes. If the previous
version released was ``v10.2.7`` a minor release would be versioned as
``v10.3.0``.

Minor releases will be backwards compatible with releases that have the same
major version number. In other words, all versions that would start with
``v10.`` should be compatible with each other.

Patch Releases
**************

A patch release include small and encapsulated code changes that do
not directly effect a Major or Minor release, for example changing
``round(...`` to ``np.around(...``, and bug fixes that were missed
when the project released the previous version. If the previous
version released ``v10.2.7`` the hotfix release would be versioned
as ``v10.2.8``.

Reference
=========

Python version
--------------

Python 3.7 or less is not supported. Although it is recommended to install ``discovery-transition-ds`` against the
latest Python version or greater whenever possible.

Pandas version
--------------

Pandas 1.0.x and above are supported but It is highly recommended to use the latest 1.0.x release as the first
major release of Pandas.

GitHub Project
--------------

discovery-transition-ds: `<https://github.com/project-hadron/discovery-transition-ds>`_.

Change log
----------

See `CHANGELOG <https://github.com/project-hadron/discovery-transition-ds/blob/master/CHANGELOG.rst>`_.


License
-------
This project uses the following license:
MIT License: `<https://opensource.org/license/mit/>`_.



Authors
-------

`Gigas64`_  (`@gigas64`_) created discovery-transition-ds.


.. _pip: https://pip.pypa.io/en/stable/installing/
.. _Github API: http://developer.github.com/v3/issues/comments/#create-a-comment
.. _Gigas64: http://opengrass.io
.. _@gigas64: https://twitter.com/gigas64



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gigas64/discovery-transition-ds",
    "name": "discovery-transition-ds",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "Wrangling ML Visualisation Dictionary Discovery Productize Classification Feature Engineering Cleansing",
    "author": "Gigas64",
    "author_email": "gigas64@opengrass.net",
    "download_url": "https://files.pythonhosted.org/packages/45/54/a8144c8bafefaccc098419b9497f320d54b77f59fe3c22e42ecc43a9b747/discovery-transition-ds-4.18.14.tar.gz",
    "platform": null,
    "description": "Project Hadron Data Science Tools and Synthetic Feature Builder\n###############################################################\n\n.. class:: no-web no-pdf\n\n.. contents:: Table of Contents\n\nFilling the Gap - Project Hadron\n================================\nProject Hadron has been built to bridge the gap between data scientists and data engineers. More specifically between\nmachine learning business outcomes and the final product.  It translates the work of data scientists into meaningful,\nproduction ready solutions that can be easily managed by product engineers.\n\nProject Hadron is a core set of abstractions that are the foundation of the three key elements that represent data\nscience, those being: (1) feature engineering, (2) the construction of synthetic data with simulators, and generators\n(3) and statistics and machine learning algorithms for discovery and creating models. Project Hadron uniquely sees\ndata as \u2018all the same\u2019 (lazyprogrammer (2020) https://lazyprogrammer.me/all-data-is-the-same/) , by which we mean\nits origin, shape and size stay independent throughout the disciplines so its content, form and structure can be\nremoved as a factor in the design and implementation of the components built.\n\nProject Hadron has been designed to place data scientists in the familiar environment of machine learning and\nstatistical tools, extracting their ideas and translating them automagicially into production ready solutions\nfamiliar to data engineers and Subject Matter Experts (SME\u2019s).\n\nProject Hadron provides a clear separation of concerns, whilst maintaining the original intentions of the data\nscientist, that can be passed to a production team. It offers trust between the data scientists teams and product\nteams. It brings with it transparency and traceability, dealing with bias, fairness, and knowledge. The resulting\noutcome provides the product engineers with adaptability, robustness, and reuse; fitting seamlessly into a\nmicroservices solution that can be language agnostic.\n\nProject Hadron is designed using Microservices. Microservices - also known as the microservice architecture - is an\narchitectural pattern that structures an application as a collection of component services that are:\n\n* Highly maintainable and testable\n* Loosely coupled\n* Independently deployable\n* Highly reusable\n* Resilient\n* Technically independent\n\nComponent services are built for business capabilities and each service performs a single function. Because they are\nindependently run, each service can be updated, deployed, and scaled to meet demand for specific functions of an\napplication. Project Hadron microservices enable the rapid, frequent and reliable delivery of large, complex\napplications. It also enables an organization to evolve its data science stack and experiment with innovative ideas.\n\nAt the heart of Project Hadron is a multi-tenant, NoSQL, singleton, in memory data store that has minimal code and\nfunctionality and has been custom built specifically for Hadron tasks in  mind. Abstracted from this is the component\nstore which allows us to build a reusable set of methods that define each tenanted component that sits separately\nfrom the store itself. In addition, a dynamic key value class provides labeling so that each tenant is not tied to\na fixed set of reference values unless by specificity. Each of the classes, the data store, the component property\nmanager, and the key value pairs that make up the component are all independent, giving complete flexibility and\nminimum code footprint to the build process of new components.\n\nThis is what gives us the Domain Contract for each tennant which sits at the heart of what makes the contracts\nreusable, translatable, transferable and brings the data scientist closer to the production engineer along with\nbuilding a production ready component solution.\n\nMain features\n-------------\n\n* Data Preparation\n* Feature Selection\n* Feature Engineering\n* Feature Cataloguing\n* Augmented Knowledge\n* Synthetic Feature Build\n\nFeature transformers\n--------------------\n\nProject Hadron is a Python library with multiple transformers to engineer and select features to use\nacross a synthetic build, statistics and machine learning.\n\n* Missing data imputation\n* Categorical encoding\n* Variable Discretisation\n* Outlier capping or removal\n* Numerical transformation\n* Redundant feature removal\n* Synthetic variable creation\n* Synthetic multivariate\n* Synthetic model distributions\n* Datetime features\n* Time series\n\nProject Hadron allows one to present optimal parameters associated with each transformer, allowing\ndifferent engineering procedures to be applied to different variables and feature subsets.\n\nBackground\n----------\nBorn out of the frustration of time constraints and the inability to show business value\nwithin a business expectation, this project aims to provide a set of tools to quickly build production ready\ndata science disciplines within a component based solution demonstrating coupling and cohesion between each\ndisipline, providing a separation of concerns between components.\n\nIt also aims to improve the communication outputs needed by ML delivery to talk to Pre-Sales, Stakholders,\nBusiness SME's, Data SME's product coders and tooling engineers while still remaining within familiar code\nparadigms.\n\nGetting Started\n===============\n\nThe ``discovery-transition-ds`` package is a set of python components that are focussed on Data Science. They\nare a concrete implementation of the Project Hadron abstract core. It is build to be very light weight\nin terms of package dependencies requiring nothing beyond what would be found in an basic Data Science environment.\nIts designed to be used easily within multiple python based interfaces such as Jupyter, IDE or terminal python.\n\nPackage Installation\n--------------------\n\nThe best way to install the component packages is directly from the Python Package Index repository using pip.\n\nThe component package is ``discovery-transition-ds`` and pip installed with:\n\n.. code-block:: bash\n\n    python -m pip install discovery-transition-ds\n\nif you want to upgrade your current version then using pip install upgrade with:\n\n.. code-block:: bash\n\n    python -m pip install -U discovery-transition-ds\n\nThis will also install or update dependent third party packages. The dependencies are\nlimited to python and related Data Science tooling such as pandas, numpy, scipy,\nscikit-learn and visual packages matplotlib and seaborn, and thus have a limited\nfootprint and non-disruptive in a machine learning environment.\n\nGet the Source Code\n-------------------\n\n``discovery-transition-ds`` is actively developed on GitHub, where the code is\n`always available <https://github.com/project-hadron/discovery-transition-ds>`_.\n\nYou can clone the public repository with:\n\n.. code-block:: bash\n\n    $ git clone git@github.com:project-hadron/discovery-transition-ds.git\n\nOnce you have a copy of the source, you can embed it in your own Python\npackage, or install it into your site-packages easily running:\n\n.. code-block:: bash\n\n    $ cd discovery-transition-ds\n    $ python -m pip install .\n\nRelease Process and Rules\n-------------------------\n\nVersions to be released after ``3.5.27``, the following rules will govern\nand describe how the ``discovery-transition-ds`` produces a new release.\n\nTo find the current version of ``discovery-transition-ds``, from your\nterminal run:\n\n.. code-block:: bash\n\n    $ python -c \"import ds_discovery; print(ds_discovery.__version__)\"\n\nMajor Releases\n**************\n\nA major release will include breaking changes. When it is versioned, it will\nbe versioned as ``vX.0.0``. For example, if the previous release was\n``v10.2.7`` the next version will be ``v11.0.0``.\n\nBreaking changes are changes that break backwards compatibility with prior\nversions. If the project were to change an existing methods signature or\nalter a class or method name, that would only happen in a Major release.\nThe majority of changes to the dependant core abstraction will result in a\nmajor release. Major releases may also include miscellaneous bug fixes that\nhave significant implications.\n\nProject Hadron is committed to providing a good user experience\nand as such, committed to preserving backwards compatibility as much as possible.\nMajor releases will be infrequent and will need strong justifications before they\nare considered.\n\nMinor Releases\n**************\n\nA minor release will include addition methods, or noticeable changes to\ncode in a backward-compatable manner and miscellaneous bug fixes. If the previous\nversion released was ``v10.2.7`` a minor release would be versioned as\n``v10.3.0``.\n\nMinor releases will be backwards compatible with releases that have the same\nmajor version number. In other words, all versions that would start with\n``v10.`` should be compatible with each other.\n\nPatch Releases\n**************\n\nA patch release include small and encapsulated code changes that do\nnot directly effect a Major or Minor release, for example changing\n``round(...`` to ``np.around(...``, and bug fixes that were missed\nwhen the project released the previous version. If the previous\nversion released ``v10.2.7`` the hotfix release would be versioned\nas ``v10.2.8``.\n\nReference\n=========\n\nPython version\n--------------\n\nPython 3.7 or less is not supported. Although it is recommended to install ``discovery-transition-ds`` against the\nlatest Python version or greater whenever possible.\n\nPandas version\n--------------\n\nPandas 1.0.x and above are supported but It is highly recommended to use the latest 1.0.x release as the first\nmajor release of Pandas.\n\nGitHub Project\n--------------\n\ndiscovery-transition-ds: `<https://github.com/project-hadron/discovery-transition-ds>`_.\n\nChange log\n----------\n\nSee `CHANGELOG <https://github.com/project-hadron/discovery-transition-ds/blob/master/CHANGELOG.rst>`_.\n\n\nLicense\n-------\nThis project uses the following license:\nMIT License: `<https://opensource.org/license/mit/>`_.\n\n\n\nAuthors\n-------\n\n`Gigas64`_  (`@gigas64`_) created discovery-transition-ds.\n\n\n.. _pip: https://pip.pypa.io/en/stable/installing/\n.. _Github API: http://developer.github.com/v3/issues/comments/#create-a-comment\n.. _Gigas64: http://opengrass.io\n.. _@gigas64: https://twitter.com/gigas64\n\n\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Data Science to production accelerator",
    "version": "4.18.14",
    "project_urls": {
        "Homepage": "https://github.com/gigas64/discovery-transition-ds"
    },
    "split_keywords": [
        "wrangling",
        "ml",
        "visualisation",
        "dictionary",
        "discovery",
        "productize",
        "classification",
        "feature",
        "engineering",
        "cleansing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3c0af09ae62212db94ff107fc7ce7113412818f79b04a9dfd7249b9e76050a92",
                "md5": "1bc9e59d4143c1f190b5368139d1cb91",
                "sha256": "364e46b8a6b72e1ae5666f4ee3437a1af1a8787b31e89dba648c85bd97df907c"
            },
            "downloads": -1,
            "filename": "discovery_transition_ds-4.18.14-py38-none-any.whl",
            "has_sig": false,
            "md5_digest": "1bc9e59d4143c1f190b5368139d1cb91",
            "packagetype": "bdist_wheel",
            "python_version": "py38",
            "requires_python": ">=3.7",
            "size": 6906324,
            "upload_time": "2023-07-03T15:30:41",
            "upload_time_iso_8601": "2023-07-03T15:30:41.911926Z",
            "url": "https://files.pythonhosted.org/packages/3c/0a/f09ae62212db94ff107fc7ce7113412818f79b04a9dfd7249b9e76050a92/discovery_transition_ds-4.18.14-py38-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4554a8144c8bafefaccc098419b9497f320d54b77f59fe3c22e42ecc43a9b747",
                "md5": "7bcae6f12825a1eda2d35c286a796631",
                "sha256": "95ecc2eb8034c0e0e783d26393664f12e5e64f69c6b8a85f12fc41fd866ddd98"
            },
            "downloads": -1,
            "filename": "discovery-transition-ds-4.18.14.tar.gz",
            "has_sig": false,
            "md5_digest": "7bcae6f12825a1eda2d35c286a796631",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6727093,
            "upload_time": "2023-07-03T15:30:54",
            "upload_time_iso_8601": "2023-07-03T15:30:54.586345Z",
            "url": "https://files.pythonhosted.org/packages/45/54/a8144c8bafefaccc098419b9497f320d54b77f59fe3c22e42ecc43a9b747/discovery-transition-ds-4.18.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-03 15:30:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gigas64",
    "github_project": "discovery-transition-ds",
    "github_not_found": true,
    "lcname": "discovery-transition-ds"
}
        
Elapsed time: 0.08149s