discovery-transition-ds


Namediscovery-transition-ds JSON
Version 2.9.9 PyPI version JSON
download
home_pagehttp://github.com/gigas64/discovery-transition-ds
SummaryAdvanced data cleaning, data wrangling and feature extraction tools for ML engineers
upload_time2020-07-01 17:23:50
maintainer
docs_urlNone
authorGigas64
requires_python>=3.6
licenseBSD
keywords wrangling ml visualisation dictionary discovery productize classification feature engineering cleansing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            AI-STAC Discovery Transition and Feature Catalog
################################################

.. class:: no-web no-pdf

|pypi| |license| |wheel|


.. contents::

.. section-numbering::

What is AI-STAC
===============

“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.”
— John Tukey

Augmented Intent - Single Task Accelerator components (AI-STAC) is a unique approach to data recovery, discovery, synthesis
and modeling that innovates the approach to data science and it's transition to production. it's origins came
from an incubator project that shadowed a team of Ph.D. data scientists in connection with the development and delivery
of machine learning initiatives to define measurable benefit propositions for customer success. From this, a number of
observable 'capabilities' were identified as unique and separate concerns. The challenges of the data scientist, and in
turn the production teams, were to effectively leveraging that separation of concern and distribute and loosely couple
the specialist capability needs to the appropriate skills set.

In addition the need to remove the opaque nature of the machine learning end-to-end required better transparency and
traceability, to better inform to the broadest of interested parties and be able to adapt without leaving being the
code 'sludge' of redundant ideas. AI-STAC is a disruptive innovation, changing the way we approach the challenges of
Machine Learning and Augmented Inelegance, introduces the ideas of 'Single Task Adaptive Component' around the
core concept of 'Parameterised Intent'

Main features
=============

* Machine Learning Capability Mapping
* Parametrised Intent
* Discovery Transitioning
* Feature Cataloguing
* Augmented Knowledge

Overview
========
AI-STAC is a change of approach in terms of improving productivity of the data
scientists. This approach deconstructs the machine learning discovery vertical into a set of capabilities, ideas and
knowledge.  It presents a completely novel approach to the traditional process automation and model wrapping that is
broadly offered as a solution to solve the considerable challenges that currently restrict the effectiveness of
machine learning in the enterprise business.

To achieve this, the project offers advanced and specialized programming methods that are unique in approach and novel
while maintaining familiarity within common tooling can be identified in four constructs.

1. Machine Learning Capability Mapping - Separation of capabilities, breaking the machine learning vertical into a set
of decoupled and targeted layers of discrete and refined actions that collectively present a human lead (ethical AI)
base truth to the next set of capabilities. This not only allows improved transparency of, what is, a messy and
sometimes confusing set of discovery orientated coded ideas but also loosely couples and targets activities that are,
generally, complex and specialized into identifiable and discrete capabilities that can be chained as separately
allocated activities.

2. Parametrized Intent - A unique technique extracting the ideas and thinking of the data scientist from their
discovery code and capturing it as intent with parameters that can be replayed against productionized code and data.
This decoupling and Separation of Concern between data, code and the intent of actions from that code on that data,
considerably improves time to market, code reuse, transparency of actions and the communication of ideas between data
scientists and product delivery specialists.

3. Discovery Transitioning - Discovery Transitioning - is a foundation of the sepatation of concerns between data
provisioning and feature selection. As part of the Accelerated ML discovery Vertical, Transitioning is a foundation
base truth facilitating a transparent transition of the raw canonical dataset to a fit-for-purpose canonical dataset
to enable the optimisation of discovery analysis and the identification of features-of-interest, for the data scientist
and created boundary separation of capabilities decoupling the Data Scientist for the Data Engineer. As output it also
provides 'intelligent Communication', not only to the Data Scientist through canonical fit-for-purpose datasets, but
more generally offers powerful visual discovery tools and artefact generation for production architects, data and
business SME's, Stakeholders and is the initiator of Augmented Knowledge for an enriched and transparent shared view of
the extended data knowledge.

4. Feature Cataloguing – With cross over skills within machine learning and advanced data heuristics,
investigation identified commonality and separation across customer engagements that particularly challenged our
Ph.D data scientists in their effective delivery of customer success. As a result the project designed and developed
Feature Cataloguing, a machine learning technique of extracting and engineering features and their characteristics
appropriately parameterized for model selection.  This technique implements a juxta view of how features are
characterized and presented to the modelling layer. Traditionally features are directly mapped as a representation
of the underlying data set. Feature Cataloguing treats each individual feature as its own individual set of
characteristics as its representation. The resulting outcome considerably improves experimentation, cross feature
association, even when unrelated in the original data sets, and the reuse of identified features-of-interest across
use case and business domains.

5. Augmented Knowledge - This the ability to capture information on data, activities and the rich stream of subject
matter expertise, injected into the machine learning discovery vertical to provide an Augmented n-view of the model
build. This includes security, sensitivity, data value scaling, dictionary, observations, performance, optimization,
bias, etc. This enriched view of data allows, amongst other things, improved knowledge share, AI explainability,
feature transparency, and accountability that feeds into AI ethics, and insight analysis.

Background
==========
Born out of the frustration of time constraints and the inability to show business value
within a business expectation, this project aims to provide a set of tools to quickly
produce visual and observational results. It also aims to improve the communication
outputs needed by ML delivery to talk to Pre-Sales, Stakholders, Business SME's, Data SME's
product coders and tooling engineers while still remaining within familiar code paragigms.

The package looks to build a set of outputs as part of standard data wrangling and ML exploration
that, by their nature, are familiar tools to the various reliant people and processes. For example
Data dictionaries for SME's, Visual representations for clients and stakeholders and configuration
contracts for architects, tool builders and data ingestion.

Discovery Transition
--------------------
Discovery Transition is first and key part of an end to end process of discovery, productization and tooling. It defines
the ‘intelligence’ and business differentiators of everything downstream.

To become effective in the Discovery Transition phase, the ability to be able to micro-iterate within distinct layers
enables the needed adaptive delivery and quicker returns on ML use case.

The building and discovery of an ML model can be broken down into three Separation of Concerns (SoC)
or Scope of Responsibility (SoR) for the ML engineer and ML model builder.

- Data Preparation
- Feature Engineering
- Model selection and optimisation

with a forth discipline of insight, interpretation and profiling as an outcome. these three SoC's can be perceived as
eight distinct disciplines

Conceptuasl Eight stages of Model preparation
---------------------------------------------
#. Connectivity (data sourcing and persisting, fit-for-purpose, quality, quantity, veracity, connectivity)
#. Data Discovery (filter, selection, typing, cleaning, valuing, validating)
#. Augmented Knowledge (observation, visualisation, knowledge, value scale)
#. Data Attribution (attribute mapping, quantitative attribute characterisation. predictor selection)
#. Feature Engineering (feature modelling, dirty clustering, time series, qualitative feature characterisation)
#. Feature Framing (hypothesis function, specialisation, custom model framing, model/feature selection)
#. Model Train (selection, optimisation, testing, training)
#. Model Predict (learning, feedback loops, opacity testing, insight, profiling, stabilization)

Though conceptual they do represent a set of needed disciplines and the complexity of the journey to quality output.

Layered approach and Capability Mapping
---------------------------------------
The idea behind the conceptual eight stages of Machine Learning is to layer the preparation and reuse of the activities
undertaken by the ML Data Engineer and ML Modeller. To provide a platform for micro iterations rather than a
constant repetition of repeatable tasks through the stack. It also facilitates contractual definitions between
the different disciplines that allows loose coupling and automated regeneration of the different stages of model
build. Finally it reduces the cross discipline commitments by creating a 'by-design' set of contracts targeted
at, and written in, the language of the consumer.

The concept of being able to quickly run over a single aspect of the ML discovery and then present a stable base for
the next layer to iterate against. this micro-iteration approach allows for quick to market adaptive delivery.

Getting Started
===============
The ``discovery-transition-ds`` package is a python/pandas implementation of the AI-STAC Transition component,
specifically aimed at Python, Numpy and Pandas based Data Science activities. It is build to be very light weight
in terms of package dependencies requiring nothing beyond what would be found in an basic Data Science environment.
Its designed to be used easily within multiple python based interfaces such as Jupyter, IDE or command-line python.

Installation
============

package install
---------------
The best way to install AI-STAC component packages is directly from the Python Package Index repository using pip.
All AI-STAC components are based on a pure python foundation package ``aistac-foundation``

.. code-block:: bash

    $ pip install aistac-foundation

The AI-STAC component package for the Transition is ``discovery-transition-ds`` and pip installed with:

.. code-block:: bash

    $ pip install discovery-transition-ds

if you want to upgrade your current version then using pip install upgrade with:

.. code-block:: bash

    $ pip install --upgrade discovery-transition-ds

First Time Env Setup
--------------------
In order to ease the startup of tasks a number of environment variables are available to pre-assign where and how
configuration and data can be collected. This can considerable improve the burden of setup and help in the migration
of the outcome contracts between environments.

In this section we will look at a couple of primary environment variables and demonstrate later how these are used
in the Component. In the following example we are assuming a local file reference but this is not the limit of how one
can use the environment variables to locate date from multiple different connection mediums. Examples of other
connectors include AWS S3, Hive, Redis, MongoDB, Azure Blob Storage, or specific connectors can be created very
quickly using the AS-STAC foundation abstracts.

If you are on linux or MacOS:

1. Open the current user's profile into a text editor.

.. code-block:: bash

    $> vi ~/.bash_profile.

2. Add the export command for each environment variable setting your preferred paths in this example I am setting
them to a demo projects folder

.. code-block:: bash

    # where to find the properties contracts
    export HADRON_PM_PATH=~/projects/demo/contracts

    # The default path for the source and the persisted data
    export HADRON_DEFAULT_PATH=~/projects/demo/data

3. In addition to the default environment variables you can set specific component environment variables. This is
particularly useful with the Transition component as source data tends to sit separate from our interim storage.
For Transition you replace the ``DEFAULT`` with ``TRANSITION``, and in this case specify this is the ``SOURCE`` path

.. code-block:: bash

    # specific to te transition component source path
    export HADRON_TRANSITION_SOURCE_PATH=/tmp/data/sftp

4. save your changes
5. re-run your bash_profile and check the variables have been set

.. code-block:: bash

    $> source ~/.bash_profile.
    $> env

Transition Task - Setup
=======================
The Transition Component is a 'Capability' component and a 'Separation of Concern' dealing specifically with the
transition of data from connectivity of data source to the persistence of 'data-of-interest' that has been prepared
appropriate for the language canonical, in this case 'Pandas DataFrame'.

In the following example we are assuming a local file reference and are using the default AI-STAC Connector Contracts
for Data Sourcing and Persisting, but this is not the limit of how one can use connect to data retrieval and storage.
Examples of other connectors include AWS S3, Hive, Redis, MongoDB, Azure Blob Storage, or specific connectors can be
created very quickly using the AS-STAC foundation abstracts.

Instantiation
-------------
The ``Transition`` class is the encapsulating class for the Transitioning Capability, providing a wrapper for
transitioning functionality. and imported as:

.. code-block:: python

    from ds_discovery import Transition

The easiest way to instantiate the ``Transition`` class is to use Factory Instantiation method ``.from_env(...)``
that takes advantage of our environment variables set up in the previous section. in order to differentiate each
instance of the Transition Component, we assign it a ``Task`` name that we can use going forward to retrieve
or re-create our Transition instance with all its 'Intent'

.. code-block:: python

    tr = Transition.from_env(task_name='demo')

Augmented Knowledge
-------------------
Once you have instantiated the Transition Task it is important to add a description of the task as a future remind,
for others using this task and when using the MasterLedger component (not covered in this tutorial) it allows for a
quick reference overview of all the tasks in the ledger.

.. code-block:: python

    tr.set_description("A Demo task used as an example for the Transitioning tutorial")

Note: the description should be a short summary of the task. If we need to be more verbose, and as good practice,
we can also add notes, that are timestamped and cataloged, to help augment knowledge about this
task that is carried as part of the Property Contract.

in the Transition Component notes are cataloged within five named sections:
* source - notes about the source data that help in what it is, where it came from and any SME knowledge of interest
* schema - data schemas to capture and report on the outcome data set
* observations - observations of interest or enhancement of the understanding of the task
* actions - actions needed, to be taken or have been taken within the task

each ``catalog`` can have multiple ``labels`` whick in tern can have multiple text entries, each text keyed by
timestamp. through the catalog set is fixed, ``labels`` can be any reference label

the following example adds a description to the source catalogue

.. code-block:: python

    tr.add_notes(catalog='source', label='describe', text="The source of this demo is a synthetic data set"

To retrieve the list of allowed ``catalog`` sections we use the property method:

.. code-block:: python

    tr.notes_catalog


We now have our Transition instance and had we previously set it up it will contain all the previously set
Property Contract

One-Time Connectors Settings
----------------------------
With each component task we need to set up its connectivity defining three ``Connector Contract`` which control the
loose coupling of where data is sourced and persisted to the code that uses it. Though we can define up each Connect
Contract, it is easier to take advantage of template connectors set up as part of the Factory initialisation method.

Though we can define as many Connector Contract as we like, by its nature, the Transition task has three key connectors
that need to be set up as a 'one-off' task. Once these are set they are stored in the Property Contract and thus do not
need to be set again.

Source Contract
~~~~~~~~~~~~~~~
Firstly we need to set up the 'Source Contract' that specifies the data to be sourced. Because we are taking advantage
of the environment variable ``HADRON_TRANSITION_SOURCE_PATH`` we only need to pass the source file name. In this
example we are also going to pass two 'optional' extra parameters that get passed directly to the Source reader,
``sep=`` and ``encoding=``

.. code-block:: python

    tr.set_source(uri_file='demo_data.txt', sep='\t', encoding='Latin1')


Persist Contract
~~~~~~~~~~~~~~~~
Secondly we need to specify where we are going to persist our data once we have transitioned it. Again we are going
to take advantage of what our Factory Initialisation method set up for us and allow the Transition task to define our
output based on constructed template Connector Contracts.

.. code-block:: python

    tr.set_persist()

Dictionary Contract
~~~~~~~~~~~~~~~~~~~
Finally, and optionally, we set up a Data Dictionary Connector that allows us to output a data dictionary of the source
or persist schema to a persisted state that can be shared with other parties of interest.
.. code-block:: python

    tr.set_dictionary()

Now we have set up the Connector Contracts we no longer need to reference this code again as the information as been
stored in the Property Contract. We will look later how we can report on these connectors and observe their settings

We are ready to go. The Transition task is ready to use.

Transition Task - Intent
========================

Instantiate the Task
--------------------

The easiest way to instantiate the ``Transition`` class is to use Factory Instantiation method ``.from_env(...)``
that takes advantage of our environment variables set up in the previous section. in order to differentiate each
instance of the Transition Component, we assign it a ``Task`` name that we can use going forward to retrieve
or re-create our Transition instance with all its 'Intent'

.. code-block:: python

    tr = Transition.from_env(task_name='demo')


Loading the Source Canonical
----------------------------

.. code-block:: python

    df = tr.load_source_canonical()


Canonical Reporting
-------------------

.. code-block:: python

    tr.canonical_report(df)

Parameterised Intent
--------------------
Parameterised intent is a core concept and represents the intended action and defining functions of the component.
Each method is known as a component intent and the parameters the task parameterisation of that intent. The intent
and its parameters are saved and can be replayed using the ``run_intent_pipeline(canonical)`` method

The following sections are a brief description of the intent and its parameters. To retrieve the list of available
intent methods in code run:

.. code-block:: python

    tr.intent_model.__dir__()

auto_clean_header
~~~~~~~~~~~~~~~~~
.. parsed-literal::

    def auto_clean_header(self, df, case=None, rename_map: dict=None, replace_spaces: str=None, inplace: bool=False,
                          save_intent: bool=None, intent_level: [int, str]=None):

        clean the headers of a pandas DataFrame replacing space with underscore

        :param df: the pandas.DataFrame to drop duplicates from
        :param rename_map: a from: to dictionary of headers to rename
        :param case: changes the headers to lower, upper, title, snake. if none of these then no change
        :param replace_spaces: character to replace spaces with. Default is '_' (underscore)
        :param inplace: if the passed pandas.DataFrame should be used or a deep copy
        :param save_intent: (optional) if the intent contract should be saved to the property manager
        :param intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

auto_drop_correlated
~~~~~~~~~~~~~~~~~~~~
uses 'brute force' techniques to removes highly correlated columns based on the threshold,
        set by default to 0.998.

        :df: data: the Canonical data to drop duplicates from
        :threshold: (optional) threshold correlation between columns. default 0.998
        :inc_category: (optional) if category type columns should be converted to numeric representations
        :sample_percent: a sample percentage between 0.5 and 1 to avoid over-fitting. Default is 0.85
        :random_state: a random state should be applied to the test train split. Default is None
        :inplace: if the passed Canonical, should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy Canonical,.

auto_remove_columns
~~~~~~~~~~~~~~~~~~~
auto removes columns that are np.NaN, a single value or have a predominant value greater than.

        :df: the pandas.DataFrame to auto remove
        :null_min: the minimum number of null values default to 0.998 (99.8%) nulls
        :predominant_max: the percentage max a single field predominates default is 0.998
        :nulls_list: can be boolean or a list:
                    if boolean and True then null_list equals ['NaN', 'nan', 'null', '', 'None', ' ']
                    if list then this is considered potential null values.
        :auto_contract: if the auto_category or to_category should be returned
        :test_size: a test percentage split from the df to avoid over-fitting. Default is 0 for no split
        :random_state: a random state should be applied to the test train split. Default is None
        :drop_empty_row: also drop any rows where all the values are empty
        :inplace: if to change the passed pandas.DataFrame or return a copy (see return)
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

auto_to_category
~~~~~~~~~~~~~~~~
auto categorises columns that have a max number of uniqueness with a min number of nulls
        and are object dtype

        :df: the pandas.DataFrame to auto categorise
        :unique_max: the max number of unique values in the column. default to 20
        :null_max: maximum number of null in the column between 0 and 1. default to 0.7 (70% nulls allowed)
        :fill_nulls: a value to fill nulls that then can be identified as a category type
        :nulls_list:  potential null values to replace.
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_bool_type
~~~~~~~~~~~~
converts column to bool based on the map

        :df: the Pandas.DataFrame to get the column headers from
        :bool_map: a mapping of what to make True and False
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_category_type
~~~~~~~~~~~~~~~~
converts columns to categories

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :as_num: if true returns the category as a category code
        :fill_nulls: a value to fill nulls that then can be identified as a category type
        :nulls_list:  potential null values to replace.
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_date_type
~~~~~~~~~~~~
converts columns to date types

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :as_num: if true returns number of days since 0001-01-01 00:00:00 with fraction being hours/mins/secs
        :year_first: specifies if to parse with the year first
                If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.
                If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).
        :day_first: specifies if to parse with the day first
                If True, parses dates with the day first, eg %d-%m-%Y.
                If False default to the a prefered preference, normally %m-%d-%Y (but not strict)
        :date_format: if the date can't be inferred uses date format eg format='%Y%m%d'
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_float_type
~~~~~~~~~~~~~
converts columns to float type

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :precision: how many decimal places to set the return values. if None then the number is unchanged
        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to np.nan
                    - If num_value, then replaces NaN with this number value
                    - If 'mean', then replaces NaN with the mean of the column
                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1
                    - If 'median', then replaces NaN with the median of the column
        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce' }. Default to 'coerce'
                    - If 'raise', then invalid parsing will raise an exception
                    - If 'coerce', then invalid parsing will be set as NaN
                    - If 'ignore', then invalid parsing will return the input
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_int_type
~~~~~~~~~~~
converts columns to int type

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to 0
                    - If num_value, then replaces NaN with this number value
                    - If 'mean', then replaces NaN with the mean of the column
                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1
                    - If 'median', then replaces NaN with the median of the column
        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce'
                    - If 'raise', then invalid parsing will raise an exception
                    - If 'coerce', then invalid parsing will be set as NaN
                    - If 'ignore', then invalid parsing will return the input
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_normalised
~~~~~~~~~~~~~
converts columns to float type

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :precision: how many decimal places to set the return values. if None then the number is unchanged
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_numeric_type
~~~~~~~~~~~~~~~
converts columns to int type

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :precision: how many decimal places to set the return values. if None then the number is unchanged
        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to np.nan
                    - If num_value, then replaces NaN with this number value. Must be a value not a string
                    - If 'mean', then replaces NaN with the mean of the column
                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1
                    - If 'median', then replaces NaN with the median of the column
        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce'
                    - If 'raise', then invalid parsing will raise an exception
                    - If 'coerce', then invalid parsing will be set as NaN
                    - If 'ignore', then invalid parsing will return the input
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_remove
~~~~~~~~~
remove columns from the pandas.DataFrame

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_select
~~~~~~~~~
selects columns from the pandas.DataFrame

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

to_str_type
~~~~~~~~~~~
converts columns to object type

        :df: the Pandas.DataFrame to get the column headers from
        :headers: a list of headers to drop or filter on type
        :drop: to drop or not drop the headers
        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'
        :exclude: to exclude or include the dtypes
        :regex: a regular expression to search the headers
        :re_ignore_case: true if the regex should ignore case. Default is False
        :use_string_type: if the dtype 'string' should be used or keep as object type
        :fill_nulls: a value to fill nulls that then can be identified as a category type
        :nulls_list:  potential null values to replace.
        :nulls_list: can be boolean or a list:
                    if boolean and True then null_list equals ['NaN', 'nan', 'null', '', 'None'. np.nan, None]
                    if list then this is considered potential null values.
        :inplace: if the passed pandas.DataFrame should be used or a deep copy
        :save_intent: (optional) if the intent contract should be saved to the property manager
        :intent_level: (optional) the level of the intent,
                        If None: default's 0 unless the global intent_next_available is true then -1
                        if -1: added to a level above any current instance of the intent section, level 0 if not found
                        if int: added to the level specified, overwriting any that already exist
        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.

Persist the Transitioned Canonical
----------------------------------


Save Clean Canonical
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    tr.canonical_report(df_clean)

Save Data Dictionary
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    tr.save_dictionary(tr.canonical_report(df, stylise=False))

Run Pipeline
------------

Locally
~~~~~~~

.. code-block:: python

    df_clean = tr.intent_model.run_intent_pipeline(df)

End-to-End
~~~~~~~~~~

.. code-block:: python

    tr.run_transition_pipeline()

Transparency and Traceability
=============================

Environ Report
------------------

.. code-block:: python

    tr.report_environ()

Connectors Report
-----------------

.. code-block:: python

    tr.report_connectors()

Intent Report
-------------

.. code-block:: python

    tr.report_Intent()

Run Book Report
---------------

.. code-block:: python

    tr.report_run_book()

Notes Report
------------

.. code-block:: python

    tr.report_Notes()

Schema Report
-------------


Reference
=========

Python version
--------------

Python 2.6,  2.7 or 3.5 are not supported. Although Python 3.6 is supported, it is recommended to install
``discovery-transition-ds`` against the latest Python 3.8.x whenever possible.

Pandas version
--------------

Pandas 0.25.x and above are supported but It is highly recommended to use the latest 1.0.x release as the first
major release of Pandas.

GitHub Project
--------------
discovery-transition-ds: `<https://github.com/Gigas64/discovery-transition-ds>`_.

Change log
----------

See `CHANGELOG <https://github.com/doatridge-cs/discovery-transition-ds/blob/master/CHANGELOG.rst>`_.


Licence
-------

BSD-3-Clause: `LICENSE <https://github.com/doatridge-cs/discovery-transition-ds/blob/master/LICENSE.txt>`_.


Authors
-------

`Gigas64`_  (`@gigas64`_) created discovery-transition-ds.


.. _pip: https://pip.pypa.io/en/stable/installing/
.. _Github API: http://developer.github.com/v3/issues/comments/#create-a-comment
.. _Gigas64: http://opengrass.io
.. _@gigas64: https://twitter.com/gigas64


.. |pypi| image:: https://img.shields.io/pypi/pyversions/Django.svg
    :alt: PyPI - Python Version

.. |rdt| image:: https://readthedocs.org/projects/discovery-transition-ds/badge/?version=latest
    :target: http://discovery-transition-ds.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. |license| image:: https://img.shields.io/pypi/l/Django.svg
    :target: https://github.com/Gigas64/discovery-transition-ds/blob/master/LICENSE.txt
    :alt: PyPI - License

.. |wheel| image:: https://img.shields.io/pypi/wheel/Django.svg
    :alt: PyPI - Wheel




            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/gigas64/discovery-transition-ds",
    "name": "discovery-transition-ds",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "Wrangling ML Visualisation Dictionary Discovery Productize Classification Feature Engineering Cleansing",
    "author": "Gigas64",
    "author_email": "gigas64@opengrass.net",
    "download_url": "https://files.pythonhosted.org/packages/1d/c8/81eed0ef2917c9696aa2fcf8a2ae88b747ef2232d7afd98a5d69deefd769/discovery-transition-ds-2.9.9.tar.gz",
    "platform": "",
    "description": "AI-STAC Discovery Transition and Feature Catalog\n################################################\n\n.. class:: no-web no-pdf\n\n|pypi| |license| |wheel|\n\n\n.. contents::\n\n.. section-numbering::\n\nWhat is AI-STAC\n===============\n\n\u201cExploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone.\u201d\n\u2014 John Tukey\n\nAugmented Intent - Single Task Accelerator components (AI-STAC) is a unique approach to data recovery, discovery, synthesis\nand modeling that innovates the approach to data science and it's transition to production. it's origins came\nfrom an incubator project that shadowed a team of Ph.D. data scientists in connection with the development and delivery\nof machine learning initiatives to define measurable benefit propositions for customer success. From this, a number of\nobservable 'capabilities' were identified as unique and separate concerns. The challenges of the data scientist, and in\nturn the production teams, were to effectively leveraging that separation of concern and distribute and loosely couple\nthe specialist capability needs to the appropriate skills set.\n\nIn addition the need to remove the opaque nature of the machine learning end-to-end required better transparency and\ntraceability, to better inform to the broadest of interested parties and be able to adapt without leaving being the\ncode 'sludge' of redundant ideas. AI-STAC is a disruptive innovation, changing the way we approach the challenges of\nMachine Learning and Augmented Inelegance, introduces the ideas of 'Single Task Adaptive Component' around the\ncore concept of 'Parameterised Intent'\n\nMain features\n=============\n\n* Machine Learning Capability Mapping\n* Parametrised Intent\n* Discovery Transitioning\n* Feature Cataloguing\n* Augmented Knowledge\n\nOverview\n========\nAI-STAC is a change of approach in terms of improving productivity of the data\nscientists. This approach deconstructs the machine learning discovery vertical into a set of capabilities, ideas and\nknowledge.  It presents a completely novel approach to the traditional process automation and model wrapping that is\nbroadly offered as a solution to solve the considerable challenges that currently restrict the effectiveness of\nmachine learning in the enterprise business.\n\nTo achieve this, the project offers advanced and specialized programming methods that are unique in approach and novel\nwhile maintaining familiarity within common tooling can be identified in four constructs.\n\n1. Machine Learning Capability Mapping - Separation of capabilities, breaking the machine learning vertical into a set\nof decoupled and targeted layers of discrete and refined actions that collectively present a human lead (ethical AI)\nbase truth to the next set of capabilities. This not only allows improved transparency of, what is, a messy and\nsometimes confusing set of discovery orientated coded ideas but also loosely couples and targets activities that are,\ngenerally, complex and specialized into identifiable and discrete capabilities that can be chained as separately\nallocated activities.\n\n2. Parametrized Intent - A unique technique extracting the ideas and thinking of the data scientist from their\ndiscovery code and capturing it as intent with parameters that can be replayed against productionized code and data.\nThis decoupling and Separation of Concern between data, code and the intent of actions from that code on that data,\nconsiderably improves time to market, code reuse, transparency of actions and the communication of ideas between data\nscientists and product delivery specialists.\n\n3. Discovery Transitioning - Discovery Transitioning - is a foundation of the sepatation of concerns between data\nprovisioning and feature selection. As part of the Accelerated ML discovery Vertical, Transitioning is a foundation\nbase truth facilitating a transparent transition of the raw canonical dataset to a fit-for-purpose canonical dataset\nto enable the optimisation of discovery analysis and the identification of features-of-interest, for the data scientist\nand created boundary separation of capabilities decoupling the Data Scientist for the Data Engineer. As output it also\nprovides 'intelligent Communication', not only to the Data Scientist through canonical fit-for-purpose datasets, but\nmore generally offers powerful visual discovery tools and artefact generation for production architects, data and\nbusiness SME's, Stakeholders and is the initiator of Augmented Knowledge for an enriched and transparent shared view of\nthe extended data knowledge.\n\n4. Feature Cataloguing \u2013 With cross over skills within machine learning and advanced data heuristics,\ninvestigation identified commonality and separation across customer engagements that particularly challenged our\nPh.D data scientists in their effective delivery of customer success. As a result the project designed and developed\nFeature Cataloguing, a machine learning technique of extracting and engineering features and their characteristics\nappropriately parameterized for model selection.  This technique implements a juxta view of how features are\ncharacterized and presented to the modelling layer. Traditionally features are directly mapped as a representation\nof the underlying data set. Feature Cataloguing treats each individual feature as its own individual set of\ncharacteristics as its representation. The resulting outcome considerably improves experimentation, cross feature\nassociation, even when unrelated in the original data sets, and the reuse of identified features-of-interest across\nuse case and business domains.\n\n5. Augmented Knowledge - This the ability to capture information on data, activities and the rich stream of subject\nmatter expertise, injected into the machine learning discovery vertical to provide an Augmented n-view of the model\nbuild. This includes security, sensitivity, data value scaling, dictionary, observations, performance, optimization,\nbias, etc. This enriched view of data allows, amongst other things, improved knowledge share, AI explainability,\nfeature transparency, and accountability that feeds into AI ethics, and insight analysis.\n\nBackground\n==========\nBorn out of the frustration of time constraints and the inability to show business value\nwithin a business expectation, this project aims to provide a set of tools to quickly\nproduce visual and observational results. It also aims to improve the communication\noutputs needed by ML delivery to talk to Pre-Sales, Stakholders, Business SME's, Data SME's\nproduct coders and tooling engineers while still remaining within familiar code paragigms.\n\nThe package looks to build a set of outputs as part of standard data wrangling and ML exploration\nthat, by their nature, are familiar tools to the various reliant people and processes. For example\nData dictionaries for SME's, Visual representations for clients and stakeholders and configuration\ncontracts for architects, tool builders and data ingestion.\n\nDiscovery Transition\n--------------------\nDiscovery Transition is first and key part of an end to end process of discovery, productization and tooling. It defines\nthe \u2018intelligence\u2019 and business differentiators of everything downstream.\n\nTo become effective in the Discovery Transition phase, the ability to be able to micro-iterate within distinct layers\nenables the needed adaptive delivery and quicker returns on ML use case.\n\nThe building and discovery of an ML model can be broken down into three Separation of Concerns (SoC)\nor Scope of Responsibility (SoR) for the ML engineer and ML model builder.\n\n- Data Preparation\n- Feature Engineering\n- Model selection and optimisation\n\nwith a forth discipline of insight, interpretation and profiling as an outcome. these three SoC's can be perceived as\neight distinct disciplines\n\nConceptuasl Eight stages of Model preparation\n---------------------------------------------\n#. Connectivity (data sourcing and persisting, fit-for-purpose, quality, quantity, veracity, connectivity)\n#. Data Discovery (filter, selection, typing, cleaning, valuing, validating)\n#. Augmented Knowledge (observation, visualisation, knowledge, value scale)\n#. Data Attribution (attribute mapping, quantitative attribute characterisation. predictor selection)\n#. Feature Engineering (feature modelling, dirty clustering, time series, qualitative feature characterisation)\n#. Feature Framing (hypothesis function, specialisation, custom model framing, model/feature selection)\n#. Model Train (selection, optimisation, testing, training)\n#. Model Predict (learning, feedback loops, opacity testing, insight, profiling, stabilization)\n\nThough conceptual they do represent a set of needed disciplines and the complexity of the journey to quality output.\n\nLayered approach and Capability Mapping\n---------------------------------------\nThe idea behind the conceptual eight stages of Machine Learning is to layer the preparation and reuse of the activities\nundertaken by the ML Data Engineer and ML Modeller. To provide a platform for micro iterations rather than a\nconstant repetition of repeatable tasks through the stack. It also facilitates contractual definitions between\nthe different disciplines that allows loose coupling and automated regeneration of the different stages of model\nbuild. Finally it reduces the cross discipline commitments by creating a 'by-design' set of contracts targeted\nat, and written in, the language of the consumer.\n\nThe concept of being able to quickly run over a single aspect of the ML discovery and then present a stable base for\nthe next layer to iterate against. this micro-iteration approach allows for quick to market adaptive delivery.\n\nGetting Started\n===============\nThe ``discovery-transition-ds`` package is a python/pandas implementation of the AI-STAC Transition component,\nspecifically aimed at Python, Numpy and Pandas based Data Science activities. It is build to be very light weight\nin terms of package dependencies requiring nothing beyond what would be found in an basic Data Science environment.\nIts designed to be used easily within multiple python based interfaces such as Jupyter, IDE or command-line python.\n\nInstallation\n============\n\npackage install\n---------------\nThe best way to install AI-STAC component packages is directly from the Python Package Index repository using pip.\nAll AI-STAC components are based on a pure python foundation package ``aistac-foundation``\n\n.. code-block:: bash\n\n    $ pip install aistac-foundation\n\nThe AI-STAC component package for the Transition is ``discovery-transition-ds`` and pip installed with:\n\n.. code-block:: bash\n\n    $ pip install discovery-transition-ds\n\nif you want to upgrade your current version then using pip install upgrade with:\n\n.. code-block:: bash\n\n    $ pip install --upgrade discovery-transition-ds\n\nFirst Time Env Setup\n--------------------\nIn order to ease the startup of tasks a number of environment variables are available to pre-assign where and how\nconfiguration and data can be collected. This can considerable improve the burden of setup and help in the migration\nof the outcome contracts between environments.\n\nIn this section we will look at a couple of primary environment variables and demonstrate later how these are used\nin the Component. In the following example we are assuming a local file reference but this is not the limit of how one\ncan use the environment variables to locate date from multiple different connection mediums. Examples of other\nconnectors include AWS S3, Hive, Redis, MongoDB, Azure Blob Storage, or specific connectors can be created very\nquickly using the AS-STAC foundation abstracts.\n\nIf you are on linux or MacOS:\n\n1. Open the current user's profile into a text editor.\n\n.. code-block:: bash\n\n    $> vi ~/.bash_profile.\n\n2. Add the export command for each environment variable setting your preferred paths in this example I am setting\nthem to a demo projects folder\n\n.. code-block:: bash\n\n    # where to find the properties contracts\n    export HADRON_PM_PATH=~/projects/demo/contracts\n\n    # The default path for the source and the persisted data\n    export HADRON_DEFAULT_PATH=~/projects/demo/data\n\n3. In addition to the default environment variables you can set specific component environment variables. This is\nparticularly useful with the Transition component as source data tends to sit separate from our interim storage.\nFor Transition you replace the ``DEFAULT`` with ``TRANSITION``, and in this case specify this is the ``SOURCE`` path\n\n.. code-block:: bash\n\n    # specific to te transition component source path\n    export HADRON_TRANSITION_SOURCE_PATH=/tmp/data/sftp\n\n4. save your changes\n5. re-run your bash_profile and check the variables have been set\n\n.. code-block:: bash\n\n    $> source ~/.bash_profile.\n    $> env\n\nTransition Task - Setup\n=======================\nThe Transition Component is a 'Capability' component and a 'Separation of Concern' dealing specifically with the\ntransition of data from connectivity of data source to the persistence of 'data-of-interest' that has been prepared\nappropriate for the language canonical, in this case 'Pandas DataFrame'.\n\nIn the following example we are assuming a local file reference and are using the default AI-STAC Connector Contracts\nfor Data Sourcing and Persisting, but this is not the limit of how one can use connect to data retrieval and storage.\nExamples of other connectors include AWS S3, Hive, Redis, MongoDB, Azure Blob Storage, or specific connectors can be\ncreated very quickly using the AS-STAC foundation abstracts.\n\nInstantiation\n-------------\nThe ``Transition`` class is the encapsulating class for the Transitioning Capability, providing a wrapper for\ntransitioning functionality. and imported as:\n\n.. code-block:: python\n\n    from ds_discovery import Transition\n\nThe easiest way to instantiate the ``Transition`` class is to use Factory Instantiation method ``.from_env(...)``\nthat takes advantage of our environment variables set up in the previous section. in order to differentiate each\ninstance of the Transition Component, we assign it a ``Task`` name that we can use going forward to retrieve\nor re-create our Transition instance with all its 'Intent'\n\n.. code-block:: python\n\n    tr = Transition.from_env(task_name='demo')\n\nAugmented Knowledge\n-------------------\nOnce you have instantiated the Transition Task it is important to add a description of the task as a future remind,\nfor others using this task and when using the MasterLedger component (not covered in this tutorial) it allows for a\nquick reference overview of all the tasks in the ledger.\n\n.. code-block:: python\n\n    tr.set_description(\"A Demo task used as an example for the Transitioning tutorial\")\n\nNote: the description should be a short summary of the task. If we need to be more verbose, and as good practice,\nwe can also add notes, that are timestamped and cataloged, to help augment knowledge about this\ntask that is carried as part of the Property Contract.\n\nin the Transition Component notes are cataloged within five named sections:\n* source - notes about the source data that help in what it is, where it came from and any SME knowledge of interest\n* schema - data schemas to capture and report on the outcome data set\n* observations - observations of interest or enhancement of the understanding of the task\n* actions - actions needed, to be taken or have been taken within the task\n\neach ``catalog`` can have multiple ``labels`` whick in tern can have multiple text entries, each text keyed by\ntimestamp. through the catalog set is fixed, ``labels`` can be any reference label\n\nthe following example adds a description to the source catalogue\n\n.. code-block:: python\n\n    tr.add_notes(catalog='source', label='describe', text=\"The source of this demo is a synthetic data set\"\n\nTo retrieve the list of allowed ``catalog`` sections we use the property method:\n\n.. code-block:: python\n\n    tr.notes_catalog\n\n\nWe now have our Transition instance and had we previously set it up it will contain all the previously set\nProperty Contract\n\nOne-Time Connectors Settings\n----------------------------\nWith each component task we need to set up its connectivity defining three ``Connector Contract`` which control the\nloose coupling of where data is sourced and persisted to the code that uses it. Though we can define up each Connect\nContract, it is easier to take advantage of template connectors set up as part of the Factory initialisation method.\n\nThough we can define as many Connector Contract as we like, by its nature, the Transition task has three key connectors\nthat need to be set up as a 'one-off' task. Once these are set they are stored in the Property Contract and thus do not\nneed to be set again.\n\nSource Contract\n~~~~~~~~~~~~~~~\nFirstly we need to set up the 'Source Contract' that specifies the data to be sourced. Because we are taking advantage\nof the environment variable ``HADRON_TRANSITION_SOURCE_PATH`` we only need to pass the source file name. In this\nexample we are also going to pass two 'optional' extra parameters that get passed directly to the Source reader,\n``sep=`` and ``encoding=``\n\n.. code-block:: python\n\n    tr.set_source(uri_file='demo_data.txt', sep='\\t', encoding='Latin1')\n\n\nPersist Contract\n~~~~~~~~~~~~~~~~\nSecondly we need to specify where we are going to persist our data once we have transitioned it. Again we are going\nto take advantage of what our Factory Initialisation method set up for us and allow the Transition task to define our\noutput based on constructed template Connector Contracts.\n\n.. code-block:: python\n\n    tr.set_persist()\n\nDictionary Contract\n~~~~~~~~~~~~~~~~~~~\nFinally, and optionally, we set up a Data Dictionary Connector that allows us to output a data dictionary of the source\nor persist schema to a persisted state that can be shared with other parties of interest.\n.. code-block:: python\n\n    tr.set_dictionary()\n\nNow we have set up the Connector Contracts we no longer need to reference this code again as the information as been\nstored in the Property Contract. We will look later how we can report on these connectors and observe their settings\n\nWe are ready to go. The Transition task is ready to use.\n\nTransition Task - Intent\n========================\n\nInstantiate the Task\n--------------------\n\nThe easiest way to instantiate the ``Transition`` class is to use Factory Instantiation method ``.from_env(...)``\nthat takes advantage of our environment variables set up in the previous section. in order to differentiate each\ninstance of the Transition Component, we assign it a ``Task`` name that we can use going forward to retrieve\nor re-create our Transition instance with all its 'Intent'\n\n.. code-block:: python\n\n    tr = Transition.from_env(task_name='demo')\n\n\nLoading the Source Canonical\n----------------------------\n\n.. code-block:: python\n\n    df = tr.load_source_canonical()\n\n\nCanonical Reporting\n-------------------\n\n.. code-block:: python\n\n    tr.canonical_report(df)\n\nParameterised Intent\n--------------------\nParameterised intent is a core concept and represents the intended action and defining functions of the component.\nEach method is known as a component intent and the parameters the task parameterisation of that intent. The intent\nand its parameters are saved and can be replayed using the ``run_intent_pipeline(canonical)`` method\n\nThe following sections are a brief description of the intent and its parameters. To retrieve the list of available\nintent methods in code run:\n\n.. code-block:: python\n\n    tr.intent_model.__dir__()\n\nauto_clean_header\n~~~~~~~~~~~~~~~~~\n.. parsed-literal::\n\n    def auto_clean_header(self, df, case=None, rename_map: dict=None, replace_spaces: str=None, inplace: bool=False,\n                          save_intent: bool=None, intent_level: [int, str]=None):\n\n        clean the headers of a pandas DataFrame replacing space with underscore\n\n        :param df: the pandas.DataFrame to drop duplicates from\n        :param rename_map: a from: to dictionary of headers to rename\n        :param case: changes the headers to lower, upper, title, snake. if none of these then no change\n        :param replace_spaces: character to replace spaces with. Default is '_' (underscore)\n        :param inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :param save_intent: (optional) if the intent contract should be saved to the property manager\n        :param intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nauto_drop_correlated\n~~~~~~~~~~~~~~~~~~~~\nuses 'brute force' techniques to removes highly correlated columns based on the threshold,\n        set by default to 0.998.\n\n        :df: data: the Canonical data to drop duplicates from\n        :threshold: (optional) threshold correlation between columns. default 0.998\n        :inc_category: (optional) if category type columns should be converted to numeric representations\n        :sample_percent: a sample percentage between 0.5 and 1 to avoid over-fitting. Default is 0.85\n        :random_state: a random state should be applied to the test train split. Default is None\n        :inplace: if the passed Canonical, should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy Canonical,.\n\nauto_remove_columns\n~~~~~~~~~~~~~~~~~~~\nauto removes columns that are np.NaN, a single value or have a predominant value greater than.\n\n        :df: the pandas.DataFrame to auto remove\n        :null_min: the minimum number of null values default to 0.998 (99.8%) nulls\n        :predominant_max: the percentage max a single field predominates default is 0.998\n        :nulls_list: can be boolean or a list:\n                    if boolean and True then null_list equals ['NaN', 'nan', 'null', '', 'None', ' ']\n                    if list then this is considered potential null values.\n        :auto_contract: if the auto_category or to_category should be returned\n        :test_size: a test percentage split from the df to avoid over-fitting. Default is 0 for no split\n        :random_state: a random state should be applied to the test train split. Default is None\n        :drop_empty_row: also drop any rows where all the values are empty\n        :inplace: if to change the passed pandas.DataFrame or return a copy (see return)\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nauto_to_category\n~~~~~~~~~~~~~~~~\nauto categorises columns that have a max number of uniqueness with a min number of nulls\n        and are object dtype\n\n        :df: the pandas.DataFrame to auto categorise\n        :unique_max: the max number of unique values in the column. default to 20\n        :null_max: maximum number of null in the column between 0 and 1. default to 0.7 (70% nulls allowed)\n        :fill_nulls: a value to fill nulls that then can be identified as a category type\n        :nulls_list:  potential null values to replace.\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_bool_type\n~~~~~~~~~~~~\nconverts column to bool based on the map\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :bool_map: a mapping of what to make True and False\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_category_type\n~~~~~~~~~~~~~~~~\nconverts columns to categories\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :as_num: if true returns the category as a category code\n        :fill_nulls: a value to fill nulls that then can be identified as a category type\n        :nulls_list:  potential null values to replace.\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_date_type\n~~~~~~~~~~~~\nconverts columns to date types\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :as_num: if true returns number of days since 0001-01-01 00:00:00 with fraction being hours/mins/secs\n        :year_first: specifies if to parse with the year first\n                If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.\n                If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).\n        :day_first: specifies if to parse with the day first\n                If True, parses dates with the day first, eg %d-%m-%Y.\n                If False default to the a prefered preference, normally %m-%d-%Y (but not strict)\n        :date_format: if the date can't be inferred uses date format eg format='%Y%m%d'\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_float_type\n~~~~~~~~~~~~~\nconverts columns to float type\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :precision: how many decimal places to set the return values. if None then the number is unchanged\n        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to np.nan\n                    - If num_value, then replaces NaN with this number value\n                    - If 'mean', then replaces NaN with the mean of the column\n                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1\n                    - If 'median', then replaces NaN with the median of the column\n        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce' }. Default to 'coerce'\n                    - If 'raise', then invalid parsing will raise an exception\n                    - If 'coerce', then invalid parsing will be set as NaN\n                    - If 'ignore', then invalid parsing will return the input\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_int_type\n~~~~~~~~~~~\nconverts columns to int type\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to 0\n                    - If num_value, then replaces NaN with this number value\n                    - If 'mean', then replaces NaN with the mean of the column\n                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1\n                    - If 'median', then replaces NaN with the median of the column\n        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce'\n                    - If 'raise', then invalid parsing will raise an exception\n                    - If 'coerce', then invalid parsing will be set as NaN\n                    - If 'ignore', then invalid parsing will return the input\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_normalised\n~~~~~~~~~~~~~\nconverts columns to float type\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :precision: how many decimal places to set the return values. if None then the number is unchanged\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_numeric_type\n~~~~~~~~~~~~~~~\nconverts columns to int type\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :precision: how many decimal places to set the return values. if None then the number is unchanged\n        :fillna: { num_value, 'mean', 'mode', 'median' }. Default to np.nan\n                    - If num_value, then replaces NaN with this number value. Must be a value not a string\n                    - If 'mean', then replaces NaN with the mean of the column\n                    - If 'mode', then replaces NaN with a mode of the column. random sample if more than 1\n                    - If 'median', then replaces NaN with the median of the column\n        :errors: {'ignore', 'raise', 'coerce'}, default 'coerce'\n                    - If 'raise', then invalid parsing will raise an exception\n                    - If 'coerce', then invalid parsing will be set as NaN\n                    - If 'ignore', then invalid parsing will return the input\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_remove\n~~~~~~~~~\nremove columns from the pandas.DataFrame\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_select\n~~~~~~~~~\nselects columns from the pandas.DataFrame\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nto_str_type\n~~~~~~~~~~~\nconverts columns to object type\n\n        :df: the Pandas.DataFrame to get the column headers from\n        :headers: a list of headers to drop or filter on type\n        :drop: to drop or not drop the headers\n        :dtype: the column types to include or exclude. Default None else int, float, bool, object, 'number'\n        :exclude: to exclude or include the dtypes\n        :regex: a regular expression to search the headers\n        :re_ignore_case: true if the regex should ignore case. Default is False\n        :use_string_type: if the dtype 'string' should be used or keep as object type\n        :fill_nulls: a value to fill nulls that then can be identified as a category type\n        :nulls_list:  potential null values to replace.\n        :nulls_list: can be boolean or a list:\n                    if boolean and True then null_list equals ['NaN', 'nan', 'null', '', 'None'. np.nan, None]\n                    if list then this is considered potential null values.\n        :inplace: if the passed pandas.DataFrame should be used or a deep copy\n        :save_intent: (optional) if the intent contract should be saved to the property manager\n        :intent_level: (optional) the level of the intent,\n                        If None: default's 0 unless the global intent_next_available is true then -1\n                        if -1: added to a level above any current instance of the intent section, level 0 if not found\n                        if int: added to the level specified, overwriting any that already exist\n        :return: if inplace, returns a formatted cleaner contract for this method, else a deep copy pandas.DataFrame.\n\nPersist the Transitioned Canonical\n----------------------------------\n\n\nSave Clean Canonical\n~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    tr.canonical_report(df_clean)\n\nSave Data Dictionary\n~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    tr.save_dictionary(tr.canonical_report(df, stylise=False))\n\nRun Pipeline\n------------\n\nLocally\n~~~~~~~\n\n.. code-block:: python\n\n    df_clean = tr.intent_model.run_intent_pipeline(df)\n\nEnd-to-End\n~~~~~~~~~~\n\n.. code-block:: python\n\n    tr.run_transition_pipeline()\n\nTransparency and Traceability\n=============================\n\nEnviron Report\n------------------\n\n.. code-block:: python\n\n    tr.report_environ()\n\nConnectors Report\n-----------------\n\n.. code-block:: python\n\n    tr.report_connectors()\n\nIntent Report\n-------------\n\n.. code-block:: python\n\n    tr.report_Intent()\n\nRun Book Report\n---------------\n\n.. code-block:: python\n\n    tr.report_run_book()\n\nNotes Report\n------------\n\n.. code-block:: python\n\n    tr.report_Notes()\n\nSchema Report\n-------------\n\n\nReference\n=========\n\nPython version\n--------------\n\nPython 2.6,  2.7 or 3.5 are not supported. Although Python 3.6 is supported, it is recommended to install\n``discovery-transition-ds`` against the latest Python 3.8.x whenever possible.\n\nPandas version\n--------------\n\nPandas 0.25.x and above are supported but It is highly recommended to use the latest 1.0.x release as the first\nmajor release of Pandas.\n\nGitHub Project\n--------------\ndiscovery-transition-ds: `<https://github.com/Gigas64/discovery-transition-ds>`_.\n\nChange log\n----------\n\nSee `CHANGELOG <https://github.com/doatridge-cs/discovery-transition-ds/blob/master/CHANGELOG.rst>`_.\n\n\nLicence\n-------\n\nBSD-3-Clause: `LICENSE <https://github.com/doatridge-cs/discovery-transition-ds/blob/master/LICENSE.txt>`_.\n\n\nAuthors\n-------\n\n`Gigas64`_  (`@gigas64`_) created discovery-transition-ds.\n\n\n.. _pip: https://pip.pypa.io/en/stable/installing/\n.. _Github API: http://developer.github.com/v3/issues/comments/#create-a-comment\n.. _Gigas64: http://opengrass.io\n.. _@gigas64: https://twitter.com/gigas64\n\n\n.. |pypi| image:: https://img.shields.io/pypi/pyversions/Django.svg\n    :alt: PyPI - Python Version\n\n.. |rdt| image:: https://readthedocs.org/projects/discovery-transition-ds/badge/?version=latest\n    :target: http://discovery-transition-ds.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n\n.. |license| image:: https://img.shields.io/pypi/l/Django.svg\n    :target: https://github.com/Gigas64/discovery-transition-ds/blob/master/LICENSE.txt\n    :alt: PyPI - License\n\n.. |wheel| image:: https://img.shields.io/pypi/wheel/Django.svg\n    :alt: PyPI - Wheel\n\n\n\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Advanced data cleaning, data wrangling and feature extraction tools for ML engineers",
    "version": "2.9.9",
    "split_keywords": [
        "wrangling",
        "ml",
        "visualisation",
        "dictionary",
        "discovery",
        "productize",
        "classification",
        "feature",
        "engineering",
        "cleansing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "bd2cbb6dea9ba82cb399d668f31de855",
                "sha256": "c86da7dd1f8e1ae202215d654593243ce2711b99aecd35413578feed77864eea"
            },
            "downloads": -1,
            "filename": "discovery_transition_ds-2.9.9-py37-none-any.whl",
            "has_sig": false,
            "md5_digest": "bd2cbb6dea9ba82cb399d668f31de855",
            "packagetype": "bdist_wheel",
            "python_version": "py37",
            "requires_python": ">=3.6",
            "size": 112173,
            "upload_time": "2020-07-01T17:23:48",
            "upload_time_iso_8601": "2020-07-01T17:23:48.041662Z",
            "url": "https://files.pythonhosted.org/packages/d9/12/9e53478fb2e908d1505bc6f91eb68322fe1b22878539715f233590d8e185/discovery_transition_ds-2.9.9-py37-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "70e2e2e35745b1985641b491d52c5e30",
                "sha256": "8807730e5312beb24dd8ab4d308485d6ace17fd4dbd02b417154bf80219ab171"
            },
            "downloads": -1,
            "filename": "discovery-transition-ds-2.9.9.tar.gz",
            "has_sig": false,
            "md5_digest": "70e2e2e35745b1985641b491d52c5e30",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 112337,
            "upload_time": "2020-07-01T17:23:50",
            "upload_time_iso_8601": "2020-07-01T17:23:50.574080Z",
            "url": "https://files.pythonhosted.org/packages/1d/c8/81eed0ef2917c9696aa2fcf8a2ae88b747ef2232d7afd98a5d69deefd769/discovery-transition-ds-2.9.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-07-01 17:23:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "gigas64",
    "error": "Could not fetch GitHub repository",
    "lcname": "discovery-transition-ds"
}
        
Elapsed time: 0.14501s