AdvancedAnalytics


NameAdvancedAnalytics JSON
Version 1.39 PyPI version JSON
download
home_pagehttps://github.com/tandonneur/AdvancedAnalytics
SummaryPython support for 'The Art and Science of Data Analytics'
upload_time2023-03-31 00:09:38
maintainer
docs_urlNone
authorEdward R Jones
requires_python
license
keywords analytics data map preprocessing pre-processing postprocessing post-processing nltk sci-learn sklearn statsmodels web scraping word cloud regression decision trees random forest neural network cross validation topic analysis sentiment analytic natural language processing nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            AdvancedAnalytics
===================

A collection of python modules, classes and methods for simplifying the use of machine learning solutions.  **AdvancedAnalytics** provides easy access to advanced tools in **Sci-Learn**, **NLTK** and other machine learning packages.  **AdvancedAnalytics** was developed to simplify learning python from the book *The Art and Science of Data Analytics*.

Description
===========

From a high level view, building machine learning applications typically proceeds through three stages:

    1. Data Preprocessing
    2. Modeling or Analytics
    3. Postprocessing

The classes and methods in **AdvancedAnalytics** primarily support the first and last stages of machine learning applications. 

Data scientists report they spend 80% of their total effort in first and last stages. The first stage, *data preprocessing*, is concerned with preparing the data for analysis.  This includes:

    1. identifying and correcting outliers, 
    2. imputing missing values, and 
    3. encoding data. 

The last stage, *solution postprocessing*, involves developing graphic summaries of the solution, and metrics for evaluating the quality of the solution.

Documentation and Examples
============================

The API and documentation for all classes and examples are available at https://github.com/tandonneur/AdvancedAnalytics/. 

Usage
=====

Currently the most popular usage is for supporting solutions developed using these advanced machine learning packages:

    * Sci-Learn
    * StatsModels
    * NLTK

The intention is to expand this list to other packages.  This is a simple example for linear regression that uses the data map structure to preprocess data:

.. code-block:: python

    from AdvancedAnalytics.ReplaceImputeEncode import DT
    from AdvancedAnalytics.ReplaceImputeEncode import ReplaceImputeEncode
    from AdvancedAnalytics.Tree import tree_regressor
    from sklearn.tree import DecisionTreeRegressor, export_graphviz 
    # Data Map Using DT, Data Types
    data_map = {
        "Salary":         [DT.Interval, (20000.0, 2000000.0)],
        "Department":     [DT.Nominal, ("HR", "Sales", "Marketing")] 
        "Classification": [DT.Nominal, (1, 2, 3, 4, 5)]
        "Years":          [DT.Interval, (18, 60)] }
    # Preprocess data from data frame df
    rie = ReplaceImputeEncode(data_map=data_map, interval_scaling=None,
                              nominal_encoding= "SAS", drop=True)
    encoded_df = rie.fit_transform(df)
    y = encoded_df["Salary"]
    X = encoded_df.drop("Salary", axis=1)
    dt = DecisionTreeRegressor(criterion= "gini", max_depth=4,
                                min_samples_split=5, min_samples_leaf=5)
    dt = dt.fit(X,y)
    tree_regressor.display_importance(dt, encoded_df.columns)
    tree_regressor.display_metrics(dt, X, y)

Current Modules and Classes
=============================

ReplaceImputeEncode
    Classes for Data Preprocessing
        * DT defines new data types used in the data dictionary
        * ReplaceImputeEncode a class for data preprocessing

Regression
    Classes for Linear and Logistic Regression
        * linreg support for linear regressino
        * logreg support for logistic regression
        * stepwise a variable selection class

Tree
    Classes for Decision Tree Solutions
        * tree_regressor support for regressor decision trees
        * tree_classifier support for classification decision trees

Forest
    Classes for Random Forests
        * forest_regressor support for regressor random forests
        * forest_classifier support for classification random forests

NeuralNetwork
    Classes for Neural Networks
        * nn_regressor support for regressor neural networks
        * nn_classifier support for classification neural networks

Text
    Classes for Text Analytics
        * text_analysis support for topic analysis
        * text_plot for word clouds
        * sentiment_analysis support for sentiment analysis

Internet
    Classes for Internet Applications
        * scrape support for web scrapping
        * metrics a class for solution metrics

Installation and Dependencies
=============================

**AdvancedAnalytics** is designed to work on any operating system running python 3.  It can be installed using **pip** or **conda**.

.. code-block:: python

    pip install AdvancedAnalytics
    # or
    conda install -c dr.jones AdvancedAnalytics

General Dependencies
    There are dependencies.  Most classes import one or more modules from    
    **Sci-Learn**, referenced as *sklearn* in module imports, and 
    **StatsModels**.  These are both installed with the current version
    of **anaconda**.

Installed with AdvancedAnalytics
    Most packages used by **AdvancedAnalytics** are automatically 
    installed with its installation.  These consist of the following 
    packages.

        * statsmodels
        * scikit-learn
        * scikit-image
        * nltk
        * pydotplus

Other Dependencies
    The *Tree* and *Forest* modules plot decision trees and importance
    metrics using **pydotplus** and the **graphviz** packages.  These
    should also be automatically installed with **AdvancedAnalytics**.

    However, the **graphviz** install is sometimes not fully complete 
    with the conda install.  It may require an additional pip install.

    .. code-block:: python

        pip install graphviz

Text Analytics Dependencies
    The *TextAnalytics* module uses the **NLTK**, **Sci-Learn**, and 
    **wordcloud** packages.  Usually these are also automatically 
    installed automatically with **AdvancedAnalytics**.  You can verify 
    they are installed using the following commands.

    .. code-block:: python

        conda list nltk
        conda list sci-learn
        conda list wordcloud

    However, when the **NLTK** package is installed, it does not 
    install the data used by the package.  In order to load the
    **NLTK** data run the following code once before using the 
    *TextAnalytics* module.

    .. code-block:: python

        #The following NLTK commands should be run once
        nltk.download("punkt")
        nltk.download("averaged_preceptron_tagger")
        nltk.download("stopwords")
        nltk.download("wordnet")

    The **wordcloud** package also uses a little know package
    **tinysegmenter** version 0.3.  Run the following code to ensure
    it is installed.

    .. code-block:: python

        conda install -c conda-forge tinysegmenter==0.3
        # or
        pip install tinysegmenter==0.3

Internet Dependencies
    The *Internet* module contains a class *scrape* which has some   
    functions for scraping newsfeeds. Some of these use the 
    **newspaper3k** package.  It should be automatically installed with 
    **AdvancedAnalytics**.

    However, it also uses the package **newsapi-python**, which is not 
    automatically installed.  If you intended to use this news scraping
    scraping tool, it is necessary to install the package using the 
    following code:

    .. code-block:: python

        conda install -c conda-forge newsapi
        # or
        pip install newsapi

    In addition, the newsapi service is sponsored by a commercial company
    www.newsapi.com.  You will need to register with them to obtain an 
    *API* key required to access this service.  This is free of charge 
    for developers, but there is a fee if *newsapi* is used to broadcast 
    news with an application or at a website.

Code of Conduct
---------------

Everyone interacting in the AdvancedAnalytics project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct: https://www.pypa.io/en/latest/code-of-conduct/ .



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tandonneur/AdvancedAnalytics",
    "name": "AdvancedAnalytics",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Analytics,data map,preprocessing,pre-processing,postprocessing,post-processing,NLTK,Sci-Learn,sklearn,StatsModels,web scraping,word cloud,regression,decision trees,random forest,neural network,cross validation,topic analysis,sentiment analytic,natural language processing,NLP",
    "author": "Edward R Jones",
    "author_email": "ejones@tamu.edu",
    "download_url": "https://files.pythonhosted.org/packages/c6/4f/afb4423adf5b8455a0bf19abfd9fd4352ebe80c3cf5e454dd40985971c6f/AdvancedAnalytics-1.39.tar.gz",
    "platform": null,
    "description": "AdvancedAnalytics\n===================\n\nA collection of python modules, classes and methods for simplifying the use of machine learning solutions.  **AdvancedAnalytics** provides easy access to advanced tools in **Sci-Learn**, **NLTK** and other machine learning packages.  **AdvancedAnalytics** was developed to simplify learning python from the book *The Art and Science of Data Analytics*.\n\nDescription\n===========\n\nFrom a high level view, building machine learning applications typically proceeds through three stages:\n\n    1. Data Preprocessing\n    2. Modeling or Analytics\n    3. Postprocessing\n\nThe classes and methods in **AdvancedAnalytics** primarily support the first and last stages of machine learning applications. \n\nData scientists report they spend 80% of their total effort in first and last stages. The first stage, *data preprocessing*, is concerned with preparing the data for analysis.  This includes:\n\n    1. identifying and correcting outliers, \n    2. imputing missing values, and \n    3. encoding data. \n\nThe last stage, *solution postprocessing*, involves developing graphic summaries of the solution, and metrics for evaluating the quality of the solution.\n\nDocumentation and Examples\n============================\n\nThe API and documentation for all classes and examples are available at https://github.com/tandonneur/AdvancedAnalytics/. \n\nUsage\n=====\n\nCurrently the most popular usage is for supporting solutions developed using these advanced machine learning packages:\n\n    * Sci-Learn\n    * StatsModels\n    * NLTK\n\nThe intention is to expand this list to other packages.  This is a simple example for linear regression that uses the data map structure to preprocess data:\n\n.. code-block:: python\n\n    from AdvancedAnalytics.ReplaceImputeEncode import DT\n    from AdvancedAnalytics.ReplaceImputeEncode import ReplaceImputeEncode\n    from AdvancedAnalytics.Tree import tree_regressor\n    from sklearn.tree import DecisionTreeRegressor, export_graphviz \n    # Data Map Using DT, Data Types\n    data_map = {\n        \"Salary\":         [DT.Interval, (20000.0, 2000000.0)],\n        \"Department\":     [DT.Nominal, (\"HR\", \"Sales\", \"Marketing\")] \n        \"Classification\": [DT.Nominal, (1, 2, 3, 4, 5)]\n        \"Years\":          [DT.Interval, (18, 60)] }\n    # Preprocess data from data frame df\n    rie = ReplaceImputeEncode(data_map=data_map, interval_scaling=None,\n                              nominal_encoding= \"SAS\", drop=True)\n    encoded_df = rie.fit_transform(df)\n    y = encoded_df[\"Salary\"]\n    X = encoded_df.drop(\"Salary\", axis=1)\n    dt = DecisionTreeRegressor(criterion= \"gini\", max_depth=4,\n                                min_samples_split=5, min_samples_leaf=5)\n    dt = dt.fit(X,y)\n    tree_regressor.display_importance(dt, encoded_df.columns)\n    tree_regressor.display_metrics(dt, X, y)\n\nCurrent Modules and Classes\n=============================\n\nReplaceImputeEncode\n    Classes for Data Preprocessing\n        * DT defines new data types used in the data dictionary\n        * ReplaceImputeEncode a class for data preprocessing\n\nRegression\n    Classes for Linear and Logistic Regression\n        * linreg support for linear regressino\n        * logreg support for logistic regression\n        * stepwise a variable selection class\n\nTree\n    Classes for Decision Tree Solutions\n        * tree_regressor support for regressor decision trees\n        * tree_classifier support for classification decision trees\n\nForest\n    Classes for Random Forests\n        * forest_regressor support for regressor random forests\n        * forest_classifier support for classification random forests\n\nNeuralNetwork\n    Classes for Neural Networks\n        * nn_regressor support for regressor neural networks\n        * nn_classifier support for classification neural networks\n\nText\n    Classes for Text Analytics\n        * text_analysis support for topic analysis\n        * text_plot for word clouds\n        * sentiment_analysis support for sentiment analysis\n\nInternet\n    Classes for Internet Applications\n        * scrape support for web scrapping\n        * metrics a class for solution metrics\n\nInstallation and Dependencies\n=============================\n\n**AdvancedAnalytics** is designed to work on any operating system running python 3.  It can be installed using **pip** or **conda**.\n\n.. code-block:: python\n\n    pip install AdvancedAnalytics\n    # or\n    conda install -c dr.jones AdvancedAnalytics\n\nGeneral Dependencies\n    There are dependencies.  Most classes import one or more modules from    \n    **Sci-Learn**, referenced as *sklearn* in module imports, and \n    **StatsModels**.  These are both installed with the current version\n    of **anaconda**.\n\nInstalled with AdvancedAnalytics\n    Most packages used by **AdvancedAnalytics** are automatically \n    installed with its installation.  These consist of the following \n    packages.\n\n        * statsmodels\n        * scikit-learn\n        * scikit-image\n        * nltk\n        * pydotplus\n\nOther Dependencies\n    The *Tree* and *Forest* modules plot decision trees and importance\n    metrics using **pydotplus** and the **graphviz** packages.  These\n    should also be automatically installed with **AdvancedAnalytics**.\n\n    However, the **graphviz** install is sometimes not fully complete \n    with the conda install.  It may require an additional pip install.\n\n    .. code-block:: python\n\n        pip install graphviz\n\nText Analytics Dependencies\n    The *TextAnalytics* module uses the **NLTK**, **Sci-Learn**, and \n    **wordcloud** packages.  Usually these are also automatically \n    installed automatically with **AdvancedAnalytics**.  You can verify \n    they are installed using the following commands.\n\n    .. code-block:: python\n\n        conda list nltk\n        conda list sci-learn\n        conda list wordcloud\n\n    However, when the **NLTK** package is installed, it does not \n    install the data used by the package.  In order to load the\n    **NLTK** data run the following code once before using the \n    *TextAnalytics* module.\n\n    .. code-block:: python\n\n        #The following NLTK commands should be run once\n        nltk.download(\"punkt\")\n        nltk.download(\"averaged_preceptron_tagger\")\n        nltk.download(\"stopwords\")\n        nltk.download(\"wordnet\")\n\n    The **wordcloud** package also uses a little know package\n    **tinysegmenter** version 0.3.  Run the following code to ensure\n    it is installed.\n\n    .. code-block:: python\n\n        conda install -c conda-forge tinysegmenter==0.3\n        # or\n        pip install tinysegmenter==0.3\n\nInternet Dependencies\n    The *Internet* module contains a class *scrape* which has some   \n    functions for scraping newsfeeds. Some of these use the \n    **newspaper3k** package.  It should be automatically installed with \n    **AdvancedAnalytics**.\n\n    However, it also uses the package **newsapi-python**, which is not \n    automatically installed.  If you intended to use this news scraping\n    scraping tool, it is necessary to install the package using the \n    following code:\n\n    .. code-block:: python\n\n        conda install -c conda-forge newsapi\n        # or\n        pip install newsapi\n\n    In addition, the newsapi service is sponsored by a commercial company\n    www.newsapi.com.  You will need to register with them to obtain an \n    *API* key required to access this service.  This is free of charge \n    for developers, but there is a fee if *newsapi* is used to broadcast \n    news with an application or at a website.\n\nCode of Conduct\n---------------\n\nEveryone interacting in the AdvancedAnalytics project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct: https://www.pypa.io/en/latest/code-of-conduct/ .\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Python support for 'The Art and Science of Data Analytics'",
    "version": "1.39",
    "split_keywords": [
        "analytics",
        "data map",
        "preprocessing",
        "pre-processing",
        "postprocessing",
        "post-processing",
        "nltk",
        "sci-learn",
        "sklearn",
        "statsmodels",
        "web scraping",
        "word cloud",
        "regression",
        "decision trees",
        "random forest",
        "neural network",
        "cross validation",
        "topic analysis",
        "sentiment analytic",
        "natural language processing",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4fddb53200f0b95d27ed9371ddc9fd57a29a84252ef639587ef40a1d01d7b326",
                "md5": "c2f39c58bdf135bca9beb30f73f28d87",
                "sha256": "3052697e9ddaea85b5f1b95331eb304db84df12c8c358ee786b89903a68be8d0"
            },
            "downloads": -1,
            "filename": "AdvancedAnalytics-1.39-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c2f39c58bdf135bca9beb30f73f28d87",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 64807,
            "upload_time": "2023-03-31T00:09:36",
            "upload_time_iso_8601": "2023-03-31T00:09:36.037003Z",
            "url": "https://files.pythonhosted.org/packages/4f/dd/b53200f0b95d27ed9371ddc9fd57a29a84252ef639587ef40a1d01d7b326/AdvancedAnalytics-1.39-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c64fafb4423adf5b8455a0bf19abfd9fd4352ebe80c3cf5e454dd40985971c6f",
                "md5": "df2cd728cc288aeaf9d7c2a261d2f630",
                "sha256": "d6a321f69217081d85c8f42d9ab4594047eaaa5c5d921c521c2f01624e1996c3"
            },
            "downloads": -1,
            "filename": "AdvancedAnalytics-1.39.tar.gz",
            "has_sig": false,
            "md5_digest": "df2cd728cc288aeaf9d7c2a261d2f630",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 61879,
            "upload_time": "2023-03-31T00:09:38",
            "upload_time_iso_8601": "2023-03-31T00:09:38.735193Z",
            "url": "https://files.pythonhosted.org/packages/c6/4f/afb4423adf5b8455a0bf19abfd9fd4352ebe80c3cf5e454dd40985971c6f/AdvancedAnalytics-1.39.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-31 00:09:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "tandonneur",
    "github_project": "AdvancedAnalytics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "advancedanalytics"
}
        
Elapsed time: 0.05450s