Overview
========
**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk
analysis and safety assurance via advanced natural language processing capabilities.
The full documentation is available at: https://nasa.github.io/mika/
State-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant
knowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models
specially trained for engineering applications to allow engineers to better tap into knowledge available in
safety reports, accident reports, incident reports, lessons learned documents, and other engineering
docuements.
To this end, the MIKA open-source toolkit has been developed for the following uses:
#. Enabling rapid exploration of a set of text-based engineering text documents
#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available
(flexibility)
#. Increasing the value of engineering documents through adding metadata, analyses, and summaries
Key Features
------------
MIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based
repositories. Both use BERT models as a backbone for multiple functions.
Knowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based
engineering documents. This includes both supervised and unsupervised methods, such as:
#. A variety of topic modeling methods
#. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table
#. The ability to analyze trends in hazards or failures
Information Retrieval (IR) enables the user to search a set of documents and obtain relevant documents
or passages according to their query. This includes:
#. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to
choose from pretrained or custom models
Installation
---------------
MIKA is available on PyPI and can be installed with:
.. code-block:: python
pip install nasa-mika
Note that some users have had issues with certain MIKA dependencies, such as HDBSCAN. If you encounter an issue installing a dependency via pip, we recommend first installing the dependency using conda prior to installing MIKA, for example:
.. code-block:: python
conda install -c conda-forge hdbscan
pip install nasa-mika
After installing mika, initialize nltk by running the following in python:
.. code-block:: python
import nltk
nltk.download('words')
Also, download the spacy transformer model by running the following command:
.. code-block:: python
python -m spacy download en_core_web_trf
Now you can import anything in MIKA:
.. code-block:: python
from mika.kd import FMEA
from mika.kd import Topic_Model_plus
from mika.kd.trend_analysis import *
from mika.kd.NER import *
from mika.ir import search
from mika.utils import Data
from mika.utils.SAFECOM import *
from mika.utils.SAFENET import *
from mika.utils.LLIS import *
from mika.utils.ICS import *
The latest version of MIKA is also available via the NASA github page using:
.. code-block:: python
git clone https://github.com/nasa/mika.git
MIKA includes three custom large language models, which can be found on the NASA huggingface at: https://huggingface.co/NASA-AIML
Examples in MIKA use specific datasets which are NOT included in the software distribution, however, they can be easily created by following the instuctions in the documentation at: https://nasa.github.io/mika/data.html
Prerequisites
-------------
MIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. If installing via pip, all prerequesits are included.
Alternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:
.. code-block:: python
BERTopic
datasets
gensim
matplotlib
nltk
numpy
octis
pandas
pathlib
pingouin
pkg_resources
pyLDAvis
regex
scikit-learn
scipy
seaborn
sentence-transformers
spacy
symspellpy
tomotopy
torch
transformers
wordcloud
These can be installed with pip.
Additional packages that should be downloaded for optional functions include:
.. code-block:: python
graphvis #(to plot hierarchical topic models)
pickle #(to save results)
jupyter notebook #(to view examples in the repository)
Support
-------
MIKA is considered research code and is under development to refine features, add new capabilities, and
improve workflows. Certain functions may change over time. Please contact the contributors if any bugs or
issues are present.
Contributors
------------
`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval
capability, Topic Model Plus, Data utility, Documentation
`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic
Model Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation
Notices
-------
Copyright © 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. All Rights Reserved.
Disclaimers
~~~~~~~~~~~
No Warranty: THE SUBJECT SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE. FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT "AS IS."
Waiver and Indemnity: RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT. IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW. RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT.
Raw data
{
"_id": null,
"home_page": "https://github.com/nasa/mika.git",
"name": "nasa-mika",
"maintainer": "sequoiarose",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "sequoia.r.andrade@nasa.gov",
"keywords": "Natural Language Processing, Knowledge Management, Topic Modeling",
"author": "Hannah Walsh and Sequoia Andrade",
"author_email": "hannah.s.walsh@nasa.gov",
"download_url": "https://files.pythonhosted.org/packages/ff/cc/9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04/nasa-mika-1.0.3.tar.gz",
"platform": null,
"description": "Overview\r\n========\r\n\r\n**MIKA** (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk \r\nanalysis and safety assurance via advanced natural language processing capabilities. \r\n\r\nThe full documentation is available at: https://nasa.github.io/mika/ \r\n\r\nState-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant \r\nknowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models \r\nspecially trained for engineering applications to allow engineers to better tap into knowledge available in\r\nsafety reports, accident reports, incident reports, lessons learned documents, and other engineering \r\ndocuements.\r\n\r\nTo this end, the MIKA open-source toolkit has been developed for the following uses:\r\n\r\n#. Enabling rapid exploration of a set of text-based engineering text documents\r\n\r\n#. Analyzing large, unstructured datasets, or exploiting structure in data when it is available \r\n (flexibility)\r\n\r\n#. Increasing the value of engineering documents through adding metadata, analyses, and summaries\r\n\r\nKey Features\r\n------------\r\nMIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based \r\nrepositories. Both use BERT models as a backbone for multiple functions. \r\n\r\nKnowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based \r\nengineering documents. This includes both supervised and unsupervised methods, such as:\r\n\r\n #. A variety of topic modeling methods\r\n\r\n #. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table\r\n\r\n #. The ability to analyze trends in hazards or failures\r\n\r\nInformation Retrieval (IR) enables the user to search a set of documents and obtain relevant documents \r\nor passages according to their query. This includes:\r\n\r\n #. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to \r\n choose from pretrained or custom models\r\n\r\nInstallation\r\n---------------\r\n\r\nMIKA is available on PyPI and can be installed with:\r\n\r\n.. code-block:: python\r\n\r\n pip install nasa-mika\r\n\r\nNote that some users have had issues with certain MIKA dependencies, such as HDBSCAN. If you encounter an issue installing a dependency via pip, we recommend first installing the dependency using conda prior to installing MIKA, for example:\r\n\r\n.. code-block:: python\r\n\r\n conda install -c conda-forge hdbscan\r\n pip install nasa-mika\r\n\r\nAfter installing mika, initialize nltk by running the following in python:\r\n\r\n.. code-block:: python\r\n\r\n import nltk\r\n nltk.download('words')\r\n\r\nAlso, download the spacy transformer model by running the following command:\r\n\r\n.. code-block:: python\r\n\r\n python -m spacy download en_core_web_trf\r\n \r\nNow you can import anything in MIKA:\r\n\r\n.. code-block:: python\r\n\r\n from mika.kd import FMEA\r\n from mika.kd import Topic_Model_plus\r\n from mika.kd.trend_analysis import *\r\n from mika.kd.NER import *\r\n from mika.ir import search\r\n\r\n from mika.utils import Data\r\n from mika.utils.SAFECOM import *\r\n from mika.utils.SAFENET import *\r\n from mika.utils.LLIS import *\r\n from mika.utils.ICS import *\r\n\r\nThe latest version of MIKA is also available via the NASA github page using:\r\n\r\n.. code-block:: python\r\n \r\n git clone https://github.com/nasa/mika.git\r\n\r\nMIKA includes three custom large language models, which can be found on the NASA huggingface at: https://huggingface.co/NASA-AIML \r\n\r\nExamples in MIKA use specific datasets which are NOT included in the software distribution, however, they can be easily created by following the instuctions in the documentation at: https://nasa.github.io/mika/data.html \r\n\r\nPrerequisites\r\n-------------\r\nMIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. If installing via pip, all prerequesits are included.\r\n\r\nAlternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:\r\n\r\n.. code-block:: python\r\n\r\n BERTopic\r\n datasets\r\n gensim\r\n matplotlib\r\n nltk\r\n numpy\r\n octis\r\n pandas\r\n pathlib\r\n pingouin\r\n pkg_resources\r\n pyLDAvis\r\n regex\r\n scikit-learn\r\n scipy\r\n seaborn\r\n sentence-transformers\r\n spacy\r\n symspellpy\r\n tomotopy\r\n torch\r\n transformers\r\n wordcloud\r\n\r\nThese can be installed with pip.\r\n\r\nAdditional packages that should be downloaded for optional functions include:\r\n\r\n.. code-block:: python\r\n \r\n graphvis #(to plot hierarchical topic models)\r\n pickle #(to save results)\r\n jupyter notebook #(to view examples in the repository)\r\n\r\nSupport\r\n-------\r\nMIKA is considered research code and is under development to refine features, add new capabilities, and \r\nimprove workflows. Certain functions may change over time. Please contact the contributors if any bugs or \r\nissues are present.\r\n\r\nContributors\r\n------------\r\n`Hannah Walsh <https://github.com/walshh>`_ : Semantic Search capability, Custom Information Retrieval \r\ncapability, Topic Model Plus, Data utility, Documentation\r\n\r\n`Sequoia Andrade <https://github.com/sequoiarose>`_ : FMEA capability, custom NER, Trend Analysis, Topic\r\nModel Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation\r\n\r\n\r\nNotices\r\n-------\r\n\r\nCopyright \u00c2\u00a9 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. All Rights Reserved.\r\n\r\nDisclaimers\r\n~~~~~~~~~~~\r\n\r\nNo Warranty: THE SUBJECT SOFTWARE IS PROVIDED \"AS IS\" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE. FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT \"AS IS.\"\r\n\r\nWaiver and Indemnity: RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT. IF RECIPIENT'S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT'S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW. RECIPIENT'S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT. \r\n\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Manager for Intelligent Knowledge Access (MIKA)",
"version": "1.0.3",
"project_urls": {
"Documentation": "https://nasa.github.io/mika/",
"Download": "https://github.com/nasa/mika/archive/refs/tags/v1.0.3.tar.gz",
"Homepage": "https://github.com/nasa/mika.git"
},
"split_keywords": [
"natural language processing",
" knowledge management",
" topic modeling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ffcc9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04",
"md5": "fce5afb88070a2a9695b3164e99be52c",
"sha256": "e388a909f2799092efb1e248cdb984eb0bb007c05728fb76a24d541738570169"
},
"downloads": -1,
"filename": "nasa-mika-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "fce5afb88070a2a9695b3164e99be52c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 81186,
"upload_time": "2024-07-25T20:47:22",
"upload_time_iso_8601": "2024-07-25T20:47:22.817594Z",
"url": "https://files.pythonhosted.org/packages/ff/cc/9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04/nasa-mika-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-25 20:47:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nasa",
"github_project": "mika",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "nasa-mika"
}