nasa-mika


Namenasa-mika JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/nasa/mika.git
SummaryManager for Intelligent Knowledge Access (MIKA)
upload_time2023-10-23 22:31:43
maintainersequoiarose
docs_urlNone
authorHannah Walsh and Sequoia Andrade
requires_python>=3.8
license
keywords natural language processing knowledge management topic modeling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Overview
========

**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk 
analysis and safety assurance via advanced natural language processing capabilities. 

The full documentation is available at: https://nasa.github.io/mika/ 

State-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant 
knowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models 
specially trained for engineering applications to allow engineers to better tap into knowledge available in
safety reports, accident reports, incident reports, lessons learned documents, and other engineering 
docuements.

To this end, the MIKA open-source toolkit has been developed for the following uses:

#. Enabling rapid exploration of a set of text-based engineering text documents

#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available 
   (flexibility)

#. Increasing the value of engineering documents through adding metadata, analyses, and summaries

Key Features
------------
MIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based 
repositories. Both use BERT models as a backbone for multiple functions. 

Knowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based 
engineering documents. This includes both supervised and unsupervised methods, such as:

   #. A variety of topic modeling methods

   #. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table

   #. The ability to analyze trends in hazards or failures

Information Retrieval (IR) enables the user to search a set of documents and obtain relevant documents 
or passages according to their query. This includes:

   #. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to 
      choose from pretrained or custom models

Installation
---------------
The latest version of MIKA is currently available via the NASA github and can be downloaded from the MIKA 
github page using:

.. code-block:: python 

    git clone https://github.com/nasa/mika.git

Prerequisites
-------------
MIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. 

MIKA can be installed via PyPi using 

.. code-block:: python

    pip install mika

Alternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:

.. code-block:: python

    BERTopic
    datasets
    gensim
    matplotlib
    nltk
    numpy
    octis
    pandas
    pathlib
    pingouin
    pkg_resources
    pyLDAvis
    regex
    scikit-learn
    scipy
    seaborn
    sentence-transformers
    spacy
    symspellpy
    tomotopy
    torch
    transformers
    wordcloud

These can be installed with pip.

Additional packages that should be downloaded for optional functions include:

.. code-block:: python
    
    graphvis #(to plot hierarchical topic models)
    pickle   #(to save results)
    jupyter notebook #(to view examples in the repository)

Support
-------
MIKA is considered research code and is under development to refine features, add new capabilities, and 
improve workflows. Certain functions may change over time. Please contact the contributors if any bugs or 
issues are present.

Contributors
------------
`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval 
capability, Topic Model Plus, Data utility, Documentation

`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic
Model Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation


Notices
-------

Copyright © 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration.  All Rights Reserved.

Disclaimers
~~~~~~~~~~~

No Warranty: THE SUBJECT SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE.  FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT "AS IS."

Waiver and Indemnity:  RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT.  IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW.  RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT. 



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nasa/mika.git",
    "name": "nasa-mika",
    "maintainer": "sequoiarose",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "sequoia.r.andrade@nasa.gov",
    "keywords": "Natural Language Processing,Knowledge Management,Topic Modeling",
    "author": "Hannah Walsh and Sequoia Andrade",
    "author_email": "hannah.s.walsh@nasa.gov",
    "download_url": "https://files.pythonhosted.org/packages/e1/97/953d2c444c7f0688b0cc0d9b076ee0fe4c78fa961bf67429b5403ce11cbd/nasa-mika-1.0.1.tar.gz",
    "platform": null,
    "description": "Overview\r\n========\r\n\r\n**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk \r\nanalysis and safety assurance via advanced natural language processing capabilities. \r\n\r\nThe full documentation is available at: https://nasa.github.io/mika/ \r\n\r\nState-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant \r\nknowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models \r\nspecially trained for engineering applications to allow engineers to better tap into knowledge available in\r\nsafety reports, accident reports, incident reports, lessons learned documents, and other engineering \r\ndocuements.\r\n\r\nTo this end, the MIKA open-source toolkit has been developed for the following uses:\r\n\r\n#. Enabling rapid exploration of a set of text-based engineering text documents\r\n\r\n#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available \r\n   (flexibility)\r\n\r\n#. Increasing the value of engineering documents through adding metadata, analyses, and summaries\r\n\r\nKey Features\r\n------------\r\nMIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based \r\nrepositories. Both use BERT models as a backbone for multiple functions. \r\n\r\nKnowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based \r\nengineering documents. This includes both supervised and unsupervised methods, such as:\r\n\r\n   #. A variety of topic modeling methods\r\n\r\n   #. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table\r\n\r\n   #. The ability to analyze trends in hazards or failures\r\n\r\nInformation Retrieval (IR) enables the user to search a set of documents and obtain relevant documents \r\nor passages according to their query. This includes:\r\n\r\n   #. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to \r\n      choose from pretrained or custom models\r\n\r\nInstallation\r\n---------------\r\nThe latest version of MIKA is currently available via the NASA github and can be downloaded from the MIKA \r\ngithub page using:\r\n\r\n.. code-block:: python \r\n\r\n    git clone https://github.com/nasa/mika.git\r\n\r\nPrerequisites\r\n-------------\r\nMIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. \r\n\r\nMIKA can be installed via PyPi using \r\n\r\n.. code-block:: python\r\n\r\n    pip install mika\r\n\r\nAlternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:\r\n\r\n.. code-block:: python\r\n\r\n    BERTopic\r\n    datasets\r\n    gensim\r\n    matplotlib\r\n    nltk\r\n    numpy\r\n    octis\r\n    pandas\r\n    pathlib\r\n    pingouin\r\n    pkg_resources\r\n    pyLDAvis\r\n    regex\r\n    scikit-learn\r\n    scipy\r\n    seaborn\r\n    sentence-transformers\r\n    spacy\r\n    symspellpy\r\n    tomotopy\r\n    torch\r\n    transformers\r\n    wordcloud\r\n\r\nThese can be installed with pip.\r\n\r\nAdditional packages that should be downloaded for optional functions include:\r\n\r\n.. code-block:: python\r\n    \r\n    graphvis #(to plot hierarchical topic models)\r\n    pickle   #(to save results)\r\n    jupyter notebook #(to view examples in the repository)\r\n\r\nSupport\r\n-------\r\nMIKA is considered research code and is under development to refine features, add new capabilities, and \r\nimprove workflows. Certain functions may change over time. Please contact the contributors if any bugs or \r\nissues are present.\r\n\r\nContributors\r\n------------\r\n`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval \r\ncapability, Topic Model Plus, Data utility, Documentation\r\n\r\n`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic\r\nModel Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation\r\n\r\n\r\nNotices\r\n-------\r\n\r\nCopyright \u00c2\u00a9 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration.  All Rights Reserved.\r\n\r\nDisclaimers\r\n~~~~~~~~~~~\r\n\r\nNo Warranty: THE SUBJECT SOFTWARE IS PROVIDED \"AS IS\" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE.  FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT \"AS IS.\"\r\n\r\nWaiver and Indemnity:  RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT.  IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW.  RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT. \r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Manager for Intelligent Knowledge Access (MIKA)",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://nasa.github.io/mika/",
        "Download": "https://github.com/nasa/mika/archive/refs/tags/v1.0.0.tar.gz",
        "Homepage": "https://github.com/nasa/mika.git"
    },
    "split_keywords": [
        "natural language processing",
        "knowledge management",
        "topic modeling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e197953d2c444c7f0688b0cc0d9b076ee0fe4c78fa961bf67429b5403ce11cbd",
                "md5": "5aa83e8b083070529eb769589d7754c6",
                "sha256": "1d19f20f7a7d122e62c8714480475bc50f66c3cc4e902900f708d0ae3c40e6ba"
            },
            "downloads": -1,
            "filename": "nasa-mika-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5aa83e8b083070529eb769589d7754c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 80329,
            "upload_time": "2023-10-23T22:31:43",
            "upload_time_iso_8601": "2023-10-23T22:31:43.479028Z",
            "url": "https://files.pythonhosted.org/packages/e1/97/953d2c444c7f0688b0cc0d9b076ee0fe4c78fa961bf67429b5403ce11cbd/nasa-mika-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-23 22:31:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nasa",
    "github_project": "mika",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "nasa-mika"
}
        
Elapsed time: 0.12679s