nasa-mika


Namenasa-mika JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/nasa/mika.git
SummaryManager for Intelligent Knowledge Access (MIKA)
upload_time2024-07-25 20:47:22
maintainersequoiarose
docs_urlNone
authorHannah Walsh and Sequoia Andrade
requires_python>=3.8
licenseNone
keywords natural language processing knowledge management topic modeling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Overview
========

**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk 
analysis and safety assurance via advanced natural language processing capabilities. 

The full documentation is available at: https://nasa.github.io/mika/ 

State-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant 
knowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models 
specially trained for engineering applications to allow engineers to better tap into knowledge available in
safety reports, accident reports, incident reports, lessons learned documents, and other engineering 
docuements.

To this end, the MIKA open-source toolkit has been developed for the following uses:

#. Enabling rapid exploration of a set of text-based engineering text documents

#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available 
   (flexibility)

#. Increasing the value of engineering documents through adding metadata, analyses, and summaries

Key Features
------------
MIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based 
repositories. Both use BERT models as a backbone for multiple functions. 

Knowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based 
engineering documents. This includes both supervised and unsupervised methods, such as:

   #. A variety of topic modeling methods

   #. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table

   #. The ability to analyze trends in hazards or failures

Information Retrieval (IR) enables the user to search a set of documents and obtain relevant documents 
or passages according to their query. This includes:

   #. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to 
      choose from pretrained or custom models

Installation
---------------

MIKA is available on PyPI and can be installed with:

.. code-block:: python

    pip install nasa-mika

Note that some users have had issues with certain MIKA dependencies, such as HDBSCAN. If you encounter an issue installing a dependency via pip, we recommend first installing the dependency using conda prior to installing MIKA, for example:

.. code-block:: python

    conda install -c conda-forge hdbscan
    pip install nasa-mika

After installing mika, initialize nltk by running the following in python:

.. code-block:: python

    import nltk
    nltk.download('words')

Also, download the spacy transformer model by running the following command:

.. code-block:: python

    python -m spacy download en_core_web_trf
    
Now you can import anything in MIKA:

.. code-block:: python

    from mika.kd import FMEA
    from mika.kd import Topic_Model_plus
    from mika.kd.trend_analysis import *
    from mika.kd.NER import *
    from mika.ir import search

    from mika.utils import Data
    from mika.utils.SAFECOM import *
    from mika.utils.SAFENET import *
    from mika.utils.LLIS import *
    from mika.utils.ICS import *

The latest version of MIKA is also available via the NASA github page using:

.. code-block:: python
    
    git clone https://github.com/nasa/mika.git

MIKA includes three custom large language models, which can be found on the NASA huggingface at: https://huggingface.co/NASA-AIML 

Examples in MIKA use specific datasets which are NOT included in the software distribution, however, they can be easily created by following the instuctions in the documentation at: https://nasa.github.io/mika/data.html 

Prerequisites
-------------
MIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. If installing via pip, all prerequesits are included.

Alternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:

.. code-block:: python

    BERTopic
    datasets
    gensim
    matplotlib
    nltk
    numpy
    octis
    pandas
    pathlib
    pingouin
    pkg_resources
    pyLDAvis
    regex
    scikit-learn
    scipy
    seaborn
    sentence-transformers
    spacy
    symspellpy
    tomotopy
    torch
    transformers
    wordcloud

These can be installed with pip.

Additional packages that should be downloaded for optional functions include:

.. code-block:: python
    
    graphvis #(to plot hierarchical topic models)
    pickle   #(to save results)
    jupyter notebook #(to view examples in the repository)

Support
-------
MIKA is considered research code and is under development to refine features, add new capabilities, and 
improve workflows. Certain functions may change over time. Please contact the contributors if any bugs or 
issues are present.

Contributors
------------
`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval 
capability, Topic Model Plus, Data utility, Documentation

`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic
Model Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation


Notices
-------

Copyright © 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration.  All Rights Reserved.

Disclaimers
~~~~~~~~~~~

No Warranty: THE SUBJECT SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE.  FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT "AS IS."

Waiver and Indemnity:  RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT.  IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW.  RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT. 



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nasa/mika.git",
    "name": "nasa-mika",
    "maintainer": "sequoiarose",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "sequoia.r.andrade@nasa.gov",
    "keywords": "Natural Language Processing, Knowledge Management, Topic Modeling",
    "author": "Hannah Walsh and Sequoia Andrade",
    "author_email": "hannah.s.walsh@nasa.gov",
    "download_url": "https://files.pythonhosted.org/packages/ff/cc/9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04/nasa-mika-1.0.3.tar.gz",
    "platform": null,
    "description": "Overview\r\n========\r\n\r\n**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk \r\nanalysis and safety assurance via advanced natural language processing capabilities. \r\n\r\nThe full documentation is available at: https://nasa.github.io/mika/ \r\n\r\nState-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant \r\nknowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models \r\nspecially trained for engineering applications to allow engineers to better tap into knowledge available in\r\nsafety reports, accident reports, incident reports, lessons learned documents, and other engineering \r\ndocuements.\r\n\r\nTo this end, the MIKA open-source toolkit has been developed for the following uses:\r\n\r\n#. Enabling rapid exploration of a set of text-based engineering text documents\r\n\r\n#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available \r\n   (flexibility)\r\n\r\n#. Increasing the value of engineering documents through adding metadata, analyses, and summaries\r\n\r\nKey Features\r\n------------\r\nMIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based \r\nrepositories. Both use BERT models as a backbone for multiple functions. \r\n\r\nKnowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based \r\nengineering documents. This includes both supervised and unsupervised methods, such as:\r\n\r\n   #. A variety of topic modeling methods\r\n\r\n   #. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table\r\n\r\n   #. The ability to analyze trends in hazards or failures\r\n\r\nInformation Retrieval (IR) enables the user to search a set of documents and obtain relevant documents \r\nor passages according to their query. This includes:\r\n\r\n   #. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to \r\n      choose from pretrained or custom models\r\n\r\nInstallation\r\n---------------\r\n\r\nMIKA is available on PyPI and can be installed with:\r\n\r\n.. code-block:: python\r\n\r\n    pip install nasa-mika\r\n\r\nNote that some users have had issues with certain MIKA dependencies, such as HDBSCAN. If you encounter an issue installing a dependency via pip, we recommend first installing the dependency using conda prior to installing MIKA, for example:\r\n\r\n.. code-block:: python\r\n\r\n    conda install -c conda-forge hdbscan\r\n    pip install nasa-mika\r\n\r\nAfter installing mika, initialize nltk by running the following in python:\r\n\r\n.. code-block:: python\r\n\r\n    import nltk\r\n    nltk.download('words')\r\n\r\nAlso, download the spacy transformer model by running the following command:\r\n\r\n.. code-block:: python\r\n\r\n    python -m spacy download en_core_web_trf\r\n    \r\nNow you can import anything in MIKA:\r\n\r\n.. code-block:: python\r\n\r\n    from mika.kd import FMEA\r\n    from mika.kd import Topic_Model_plus\r\n    from mika.kd.trend_analysis import *\r\n    from mika.kd.NER import *\r\n    from mika.ir import search\r\n\r\n    from mika.utils import Data\r\n    from mika.utils.SAFECOM import *\r\n    from mika.utils.SAFENET import *\r\n    from mika.utils.LLIS import *\r\n    from mika.utils.ICS import *\r\n\r\nThe latest version of MIKA is also available via the NASA github page using:\r\n\r\n.. code-block:: python\r\n    \r\n    git clone https://github.com/nasa/mika.git\r\n\r\nMIKA includes three custom large language models, which can be found on the NASA huggingface at: https://huggingface.co/NASA-AIML \r\n\r\nExamples in MIKA use specific datasets which are NOT included in the software distribution, however, they can be easily created by following the instuctions in the documentation at: https://nasa.github.io/mika/data.html \r\n\r\nPrerequisites\r\n-------------\r\nMIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. If installing via pip, all prerequesits are included.\r\n\r\nAlternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:\r\n\r\n.. code-block:: python\r\n\r\n    BERTopic\r\n    datasets\r\n    gensim\r\n    matplotlib\r\n    nltk\r\n    numpy\r\n    octis\r\n    pandas\r\n    pathlib\r\n    pingouin\r\n    pkg_resources\r\n    pyLDAvis\r\n    regex\r\n    scikit-learn\r\n    scipy\r\n    seaborn\r\n    sentence-transformers\r\n    spacy\r\n    symspellpy\r\n    tomotopy\r\n    torch\r\n    transformers\r\n    wordcloud\r\n\r\nThese can be installed with pip.\r\n\r\nAdditional packages that should be downloaded for optional functions include:\r\n\r\n.. code-block:: python\r\n    \r\n    graphvis #(to plot hierarchical topic models)\r\n    pickle   #(to save results)\r\n    jupyter notebook #(to view examples in the repository)\r\n\r\nSupport\r\n-------\r\nMIKA is considered research code and is under development to refine features, add new capabilities, and \r\nimprove workflows. Certain functions may change over time. Please contact the contributors if any bugs or \r\nissues are present.\r\n\r\nContributors\r\n------------\r\n`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval \r\ncapability, Topic Model Plus, Data utility, Documentation\r\n\r\n`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic\r\nModel Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation\r\n\r\n\r\nNotices\r\n-------\r\n\r\nCopyright \u00c2\u00a9 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration.  All Rights Reserved.\r\n\r\nDisclaimers\r\n~~~~~~~~~~~\r\n\r\nNo Warranty: THE SUBJECT SOFTWARE IS PROVIDED \"AS IS\" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE.  FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT \"AS IS.\"\r\n\r\nWaiver and Indemnity:  RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT.  IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW.  RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT. \r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Manager for Intelligent Knowledge Access (MIKA)",
    "version": "1.0.3",
    "project_urls": {
        "Documentation": "https://nasa.github.io/mika/",
        "Download": "https://github.com/nasa/mika/archive/refs/tags/v1.0.3.tar.gz",
        "Homepage": "https://github.com/nasa/mika.git"
    },
    "split_keywords": [
        "natural language processing",
        " knowledge management",
        " topic modeling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ffcc9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04",
                "md5": "fce5afb88070a2a9695b3164e99be52c",
                "sha256": "e388a909f2799092efb1e248cdb984eb0bb007c05728fb76a24d541738570169"
            },
            "downloads": -1,
            "filename": "nasa-mika-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "fce5afb88070a2a9695b3164e99be52c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 81186,
            "upload_time": "2024-07-25T20:47:22",
            "upload_time_iso_8601": "2024-07-25T20:47:22.817594Z",
            "url": "https://files.pythonhosted.org/packages/ff/cc/9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04/nasa-mika-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-25 20:47:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nasa",
    "github_project": "mika",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "nasa-mika"
}
        
Elapsed time: 0.33119s