muzlin


Namemuzlin JSON
Version 0.0.5 PyPI version JSON
download
home_pageNone
SummaryMuzlin: a filtering toolset for semantic machine learning
upload_time2024-12-16 10:22:47
maintainerDaniel Kulik
docs_urlNone
authorDaniel Kulik
requires_python>=3.8
licenseMIT License Copyright (c) 2024 Daniel Kulik Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords rag outlier-detection filtering retrieval semantic ml llm nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. image:: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Logo.png
   :target: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Logo.png
   :alt: Muzlin

*When a filter cloth 🏳️ is needed rather than a simple RAG 🏴‍☠*

**Deployment, Stats, & License**

|badge_pypi| |badge_testing| |badge_coverage| |badge_maintainability| |badge_stars|
|badge_downloads| |badge_versions| |badge_licence|

.. |badge_pypi| image:: https://img.shields.io/pypi/v/muzlin.svg?color=brightgreen&logo=pypi&logoColor=white
   :alt: PyPI version
   :target: https://pypi.org/project/muzlin/

.. |badge_testing| image:: https://github.com/KulikDM/muzlin/actions/workflows/ci.yml/badge.svg
   :alt: testing
   :target: https://github.com/KulikDM/muzlin/actions/workflows/ci.yml

.. |badge_coverage| image:: https://codecov.io/gh/KulikDM/muzlin/graph/badge.svg?token=O93AVDHCXV
   :alt: Codecov
   :target: https://codecov.io/gh/KulikDM/muzlin

.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/50c3f73536bcc37f4e2f/maintainability
   :alt: Maintainability
   :target: https://codeclimate.com/github/KulikDM/muzlin/maintainability

.. |badge_stars| image:: https://img.shields.io/github/stars/KulikDM/muzlin.svg?logo=github&logoColor=white&style=flat
   :alt: GitHub stars
   :target: https://github.com/KulikDM/muzlin/stargazers

.. |badge_downloads| image:: https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fstatic.pepy.tech%2Fbadge%2Fmuzlin&query=%2F%2F*%5Blocal-name()%20%3D%20%27text%27%5D%5Blast()%5D&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyBzdHlsZT0iZW5hYmxlLWJhY2tncm91bmQ6bmV3IDAgMCAyNCAyNDsiIHZlcnNpb249IjEuMSIgdmlld0JveD0iMCAwIDI0IDI0IiB4bWw6c3BhY2U9InByZXNlcnZlIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj48ZyBpZD0iaW5mbyIvPjxnIGlkPSJpY29ucyI%2BPGcgaWQ9InNhdmUiPjxwYXRoIGQ9Ik0xMS4yLDE2LjZjMC40LDAuNSwxLjIsMC41LDEuNiwwbDYtNi4zQzE5LjMsOS44LDE4LjgsOSwxOCw5aC00YzAsMCwwLjItNC42LDAtN2MtMC4xLTEuMS0wLjktMi0yLTJjLTEuMSwwLTEuOSwwLjktMiwyICAgIGMtMC4yLDIuMywwLDcsMCw3SDZjLTAuOCwwLTEuMywwLjgtMC44LDEuNEwxMS4yLDE2LjZ6IiBmaWxsPSIjZWJlYmViIi8%2BPHBhdGggZD0iTTE5LDE5SDVjLTEuMSwwLTIsMC45LTIsMnYwYzAsMC42LDAuNCwxLDEsMWgxNmMwLjYsMCwxLTAuNCwxLTF2MEMyMSwxOS45LDIwLjEsMTksMTksMTl6IiBmaWxsPSIjZWJlYmViIi8%2BPC9nPjwvZz48L3N2Zz4%3D&label=downloads
   :alt: Downloads
   :target: https://pepy.tech/project/muzlin

.. |badge_versions| image:: https://img.shields.io/pypi/pyversions/muzlin.svg?logo=python&logoColor=white
   :alt: Python versions
   :target: https://pypi.org/project/muzlin/

.. |badge_licence| image:: https://img.shields.io/github/license/KulikDM/muzlin.svg?logo=data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjMyIiBpZD0iaWNvbiIgdmlld0JveD0iMCAwIDMyIDMyIiB3aWR0aD0iMzIiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PGRlZnMgZmlsbD0iI2ViZjJlZSI+PHN0eWxlPgogICAgICAuY2xzLTEgewogICAgICAgIGZpbGw6IG5vbmU7CiAgICAgIH0KICAgIDwvc3R5bGU+PC9kZWZzPjxyZWN0IGhlaWdodD0iMiIgd2lkdGg9IjEyIiB4PSI4IiB5PSI2IiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iMTIiIHg9IjgiIHk9IjEwIiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iNiIgeD0iOCIgeT0iMTQiIGZpbGw9IiNlYmYyZWUiLz48cmVjdCBoZWlnaHQ9IjIiIHdpZHRoPSI0IiB4PSI4IiB5PSIyNCIgZmlsbD0iI2ViZjJlZSIvPjxwYXRoIGQ9Ik0yOS43MDcsMTkuMjkzbC0zLTNhLjk5OTQuOTk5NCwwLDAsMC0xLjQxNCwwTDE2LDI1LjU4NTlWMzBoNC40MTQxbDkuMjkyOS05LjI5M0EuOTk5NC45OTk0LDAsMCwwLDI5LjcwNywxOS4yOTNaTTE5LjU4NTksMjhIMThWMjYuNDE0MWw1LTVMMjQuNTg1OSwyM1pNMjYsMjEuNTg1OSwyNC40MTQxLDIwLDI2LDE4LjQxNDEsMjcuNTg1OSwyMFoiIGZpbGw9IiNlYmYyZWUiLz48cGF0aCBkPSJNMTIsMzBINmEyLjAwMjEsMi4wMDIxLDAsMCwxLTItMlY0QTIuMDAyMSwyLjAwMjEsMCwwLDEsNiwySDIyYTIuMDAyMSwyLjAwMjEsMCwwLDEsMiwyVjE0SDIyVjRINlYyOGg2WiIgZmlsbD0iI2ViZjJlZSIvPjxyZWN0IGNsYXNzPSJjbHMtMSIgZGF0YS1uYW1lPSImbHQ7VHJhbnNwYXJlbnQgUmVjdGFuZ2xlJmd0OyIgaGVpZ2h0PSIzMiIgaWQ9Il9UcmFuc3BhcmVudF9SZWN0YW5nbGVfIiB3aWR0aD0iMzIiIGZpbGw9IiNlYmYyZWUiLz48L3N2Zz4=
   :alt: License
   :target: https://github.com/KulikDM/muzlin/blob/master/LICENSE

----

#############
 What is it?
#############

Muzlin merges classical ML with advanced generative AI to efficiently
filter text in the context of NLP and LLMs. It answers key questions in
semantic-based workflows, such as:

-  Does a RAG/GraphRAG have the right context to answer a question?

-  Is the topk retrieved context too dense/sparse?

-  Does the generated response hallucinate or deviate from the provided
   context?

-  Should new extracted text be added to an existing RAG?

-  Can we detect inliers and outliers in collections of text embeddings
   (e.g. context, user question and answers, synthetic generated data,
   etc...)?

**Note:** While production-ready, Muzlin is still evolving and subject
to significant changes!

############
 Quickstart
############

#. **Install** Muzlin using pip:

   .. code:: bash

      pip install muzlin

#. **Create text embeddings** with a pre-trained model:

   .. code:: python

      import numpy as np
      from muzlin.encoders import HuggingFaceEncoder # Ensure torch and transformers are installed

      encoder = HuggingFaceEncoder()
      vectors = encoder(texts)  # texts is a list of strings
      vectors = np.array(vectors)
      np.save('vectors', vectors)

#. **Build an anomaly detection model** for filtering:

   .. code:: python

      from muzlin.anomaly import OutlierDetector
      from pyod.models.pca import PCA

      vectors = np.load('vectors.npy')  # Load pre-saved vectors

      od = PCA(contamination=0.02)

      clf = OutlierDetector(mlflow=False, detector=od) # Saves joblib moddel
      clf.fit(vectors)

#. **Filter new text** using the trained model:

   .. code:: python

      from muzlin.anomaly import OutlierDetector
      from muzlin.encoders import HuggingFaceEncoder
      import numpy as np

      clf = OutlierDetector(model='outlier_detector.pkl')  # Load the model
      encoder = HuggingFaceEncoder()

      vector = encoder(['Who was the first man to walk on the moon?'])
      vector = np.array(vector).reshape(1, -1)

      label = clf.predict(vector)

##############
 Integrations
##############

Muzlin integrates with a wide array of libraries for anomaly detection,
vector encoding, and graph-based setups.

+-----------------------------------+-------------------------+----------------------+
| **Anomaly Detection**             | **Encoders**            | **Vector Index**     |
+===================================+=========================+======================+
| -  Scikit-Learn                   | -  HuggingFace          | -  LangChain         |
| -  PyOD (vector)                  | -  OpenAI               | -  LlamaIndex        |
| -  PyGOD (graph)                  | -  Cohere               |                      |
| -  PyThresh (thresholding)        | -  Azure                |                      |
|                                   | -  Google               |                      |
|                                   | -  Amazon Bedrock       |                      |
|                                   | -  Fastembed            |                      |
|                                   | -  Mistral              |                      |
|                                   | -  VoyageAI             |                      |
+-----------------------------------+-------------------------+----------------------+

**Simple Schematic Implementation**

.. image:: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Simple_Example.png
   :target: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Simple_Example.png
   :alt: Muzlin Pipeline

----

###########
 Resources
###########

**Example Notebooks**

+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+
| Notebook                                                                                                          | Description                                                                 |
+===================================================================================================================+=============================================================================+
| `Introduction <https://github.com/KulikDM/muzlin/blob/main/examples/00_Introduction.ipynb>`_                      | Basic semantic vector-based outlier detection                               |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+
| `Optimal Threshold <https://github.com/KulikDM/muzlin/blob/main/examples/01_Threshold_Optimization.ipynb>`_       | Selecting optimal thresholds using various methods                          |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+
| `Cluster-Based Filtering <https://github.com/KulikDM/muzlin/blob/main/examples/02_Cluster_Filtering.ipynb>`_      | Cluster-based filtering for question answering                              |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+
| `Graph-Based Filtering <https://github.com/KulikDM/muzlin/blob/main/examples/03_Graph_Filtering.ipynb>`_          | Using graph-based anomaly detection for semantic graphs like GraphRAG       |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+

############
 What Else?
############

Looking for more? Check out other useful libraries like `Semantic Router
<https://github.com/aurelio-labs/semantic-router>`_, `CRAG
<https://github.com/HuskyInSalt/CRAG>`_, and `Scikit-LLM
<https://github.com/iryna-kondr/scikit-llm>`_

----

##############
 Contributing
##############

**Muzlin is still evolving!** At the moment their are major changes
being done and the structure of Muzlin is still being refined. For now,
please leave a bug report and potential new code for any fixes or
improvements. You will be added as a co-author if it is implemented.

Once this phase has been completed then ->

Anyone is welcome to contribute to Muzlin:

-  Please share your ideas and ask questions by opening an issue.

-  To contribute, first check the Issue list for the "help wanted" tag
   and comment on the one that you are interested in. The issue will
   then be assigned to you.

-  If the bug, feature, or documentation change is novel (not in the
   Issue list), you can either log a new issue or create a pull request
   for the new changes.

-  To start, fork the **dev branch** and add your
   improvement/modification/fix.

-  To make sure the code has the same style and standard, please refer
   to detector.py for example.

-  Create a pull request to the **dev branch** and follow the pull
   request template `PR template
   <https://github.com/KulikDM/muzlin/blob/main/.github/PULL_REQUEST_TEMPLATE.md>`_

-  Please make sure that all code changes are accompanied with proper
   new/updated test functions. Automatic tests will be triggered. Before
   the pull request can be merged, make sure that all the tests pass.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "muzlin",
    "maintainer": "Daniel Kulik",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "RAG, outlier-detection, filtering, retrieval, semantic, ML, LLM, NLP",
    "author": "Daniel Kulik",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/98/27/ff92c3049bc586c1e8f64fc9af322477f60248008d4886d1137b1c59a021/muzlin-0.0.5.tar.gz",
    "platform": null,
    "description": ".. image:: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Logo.png\n   :target: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Logo.png\n   :alt: Muzlin\n\n*When a filter cloth \ud83c\udff3\ufe0f is needed rather than a simple RAG \ud83c\udff4\u200d\u2620*\n\n**Deployment, Stats, & License**\n\n|badge_pypi| |badge_testing| |badge_coverage| |badge_maintainability| |badge_stars|\n|badge_downloads| |badge_versions| |badge_licence|\n\n.. |badge_pypi| image:: https://img.shields.io/pypi/v/muzlin.svg?color=brightgreen&logo=pypi&logoColor=white\n   :alt: PyPI version\n   :target: https://pypi.org/project/muzlin/\n\n.. |badge_testing| image:: https://github.com/KulikDM/muzlin/actions/workflows/ci.yml/badge.svg\n   :alt: testing\n   :target: https://github.com/KulikDM/muzlin/actions/workflows/ci.yml\n\n.. |badge_coverage| image:: https://codecov.io/gh/KulikDM/muzlin/graph/badge.svg?token=O93AVDHCXV\n   :alt: Codecov\n   :target: https://codecov.io/gh/KulikDM/muzlin\n\n.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/50c3f73536bcc37f4e2f/maintainability\n   :alt: Maintainability\n   :target: https://codeclimate.com/github/KulikDM/muzlin/maintainability\n\n.. |badge_stars| image:: https://img.shields.io/github/stars/KulikDM/muzlin.svg?logo=github&logoColor=white&style=flat\n   :alt: GitHub stars\n   :target: https://github.com/KulikDM/muzlin/stargazers\n\n.. |badge_downloads| image:: https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fstatic.pepy.tech%2Fbadge%2Fmuzlin&query=%2F%2F*%5Blocal-name()%20%3D%20%27text%27%5D%5Blast()%5D&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyBzdHlsZT0iZW5hYmxlLWJhY2tncm91bmQ6bmV3IDAgMCAyNCAyNDsiIHZlcnNpb249IjEuMSIgdmlld0JveD0iMCAwIDI0IDI0IiB4bWw6c3BhY2U9InByZXNlcnZlIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj48ZyBpZD0iaW5mbyIvPjxnIGlkPSJpY29ucyI%2BPGcgaWQ9InNhdmUiPjxwYXRoIGQ9Ik0xMS4yLDE2LjZjMC40LDAuNSwxLjIsMC41LDEuNiwwbDYtNi4zQzE5LjMsOS44LDE4LjgsOSwxOCw5aC00YzAsMCwwLjItNC42LDAtN2MtMC4xLTEuMS0wLjktMi0yLTJjLTEuMSwwLTEuOSwwLjktMiwyICAgIGMtMC4yLDIuMywwLDcsMCw3SDZjLTAuOCwwLTEuMywwLjgtMC44LDEuNEwxMS4yLDE2LjZ6IiBmaWxsPSIjZWJlYmViIi8%2BPHBhdGggZD0iTTE5LDE5SDVjLTEuMSwwLTIsMC45LTIsMnYwYzAsMC42LDAuNCwxLDEsMWgxNmMwLjYsMCwxLTAuNCwxLTF2MEMyMSwxOS45LDIwLjEsMTksMTksMTl6IiBmaWxsPSIjZWJlYmViIi8%2BPC9nPjwvZz48L3N2Zz4%3D&label=downloads\n   :alt: Downloads\n   :target: https://pepy.tech/project/muzlin\n\n.. |badge_versions| image:: https://img.shields.io/pypi/pyversions/muzlin.svg?logo=python&logoColor=white\n   :alt: Python versions\n   :target: https://pypi.org/project/muzlin/\n\n.. |badge_licence| image:: https://img.shields.io/github/license/KulikDM/muzlin.svg?logo=data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjMyIiBpZD0iaWNvbiIgdmlld0JveD0iMCAwIDMyIDMyIiB3aWR0aD0iMzIiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PGRlZnMgZmlsbD0iI2ViZjJlZSI+PHN0eWxlPgogICAgICAuY2xzLTEgewogICAgICAgIGZpbGw6IG5vbmU7CiAgICAgIH0KICAgIDwvc3R5bGU+PC9kZWZzPjxyZWN0IGhlaWdodD0iMiIgd2lkdGg9IjEyIiB4PSI4IiB5PSI2IiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iMTIiIHg9IjgiIHk9IjEwIiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iNiIgeD0iOCIgeT0iMTQiIGZpbGw9IiNlYmYyZWUiLz48cmVjdCBoZWlnaHQ9IjIiIHdpZHRoPSI0IiB4PSI4IiB5PSIyNCIgZmlsbD0iI2ViZjJlZSIvPjxwYXRoIGQ9Ik0yOS43MDcsMTkuMjkzbC0zLTNhLjk5OTQuOTk5NCwwLDAsMC0xLjQxNCwwTDE2LDI1LjU4NTlWMzBoNC40MTQxbDkuMjkyOS05LjI5M0EuOTk5NC45OTk0LDAsMCwwLDI5LjcwNywxOS4yOTNaTTE5LjU4NTksMjhIMThWMjYuNDE0MWw1LTVMMjQuNTg1OSwyM1pNMjYsMjEuNTg1OSwyNC40MTQxLDIwLDI2LDE4LjQxNDEsMjcuNTg1OSwyMFoiIGZpbGw9IiNlYmYyZWUiLz48cGF0aCBkPSJNMTIsMzBINmEyLjAwMjEsMi4wMDIxLDAsMCwxLTItMlY0QTIuMDAyMSwyLjAwMjEsMCwwLDEsNiwySDIyYTIuMDAyMSwyLjAwMjEsMCwwLDEsMiwyVjE0SDIyVjRINlYyOGg2WiIgZmlsbD0iI2ViZjJlZSIvPjxyZWN0IGNsYXNzPSJjbHMtMSIgZGF0YS1uYW1lPSImbHQ7VHJhbnNwYXJlbnQgUmVjdGFuZ2xlJmd0OyIgaGVpZ2h0PSIzMiIgaWQ9Il9UcmFuc3BhcmVudF9SZWN0YW5nbGVfIiB3aWR0aD0iMzIiIGZpbGw9IiNlYmYyZWUiLz48L3N2Zz4=\n   :alt: License\n   :target: https://github.com/KulikDM/muzlin/blob/master/LICENSE\n\n----\n\n#############\n What is it?\n#############\n\nMuzlin merges classical ML with advanced generative AI to efficiently\nfilter text in the context of NLP and LLMs. It answers key questions in\nsemantic-based workflows, such as:\n\n-  Does a RAG/GraphRAG have the right context to answer a question?\n\n-  Is the topk retrieved context too dense/sparse?\n\n-  Does the generated response hallucinate or deviate from the provided\n   context?\n\n-  Should new extracted text be added to an existing RAG?\n\n-  Can we detect inliers and outliers in collections of text embeddings\n   (e.g. context, user question and answers, synthetic generated data,\n   etc...)?\n\n**Note:** While production-ready, Muzlin is still evolving and subject\nto significant changes!\n\n############\n Quickstart\n############\n\n#. **Install** Muzlin using pip:\n\n   .. code:: bash\n\n      pip install muzlin\n\n#. **Create text embeddings** with a pre-trained model:\n\n   .. code:: python\n\n      import numpy as np\n      from muzlin.encoders import HuggingFaceEncoder # Ensure torch and transformers are installed\n\n      encoder = HuggingFaceEncoder()\n      vectors = encoder(texts)  # texts is a list of strings\n      vectors = np.array(vectors)\n      np.save('vectors', vectors)\n\n#. **Build an anomaly detection model** for filtering:\n\n   .. code:: python\n\n      from muzlin.anomaly import OutlierDetector\n      from pyod.models.pca import PCA\n\n      vectors = np.load('vectors.npy')  # Load pre-saved vectors\n\n      od = PCA(contamination=0.02)\n\n      clf = OutlierDetector(mlflow=False, detector=od) # Saves joblib moddel\n      clf.fit(vectors)\n\n#. **Filter new text** using the trained model:\n\n   .. code:: python\n\n      from muzlin.anomaly import OutlierDetector\n      from muzlin.encoders import HuggingFaceEncoder\n      import numpy as np\n\n      clf = OutlierDetector(model='outlier_detector.pkl')  # Load the model\n      encoder = HuggingFaceEncoder()\n\n      vector = encoder(['Who was the first man to walk on the moon?'])\n      vector = np.array(vector).reshape(1, -1)\n\n      label = clf.predict(vector)\n\n##############\n Integrations\n##############\n\nMuzlin integrates with a wide array of libraries for anomaly detection,\nvector encoding, and graph-based setups.\n\n+-----------------------------------+-------------------------+----------------------+\n| **Anomaly Detection**             | **Encoders**            | **Vector Index**     |\n+===================================+=========================+======================+\n| -  Scikit-Learn                   | -  HuggingFace          | -  LangChain         |\n| -  PyOD (vector)                  | -  OpenAI               | -  LlamaIndex        |\n| -  PyGOD (graph)                  | -  Cohere               |                      |\n| -  PyThresh (thresholding)        | -  Azure                |                      |\n|                                   | -  Google               |                      |\n|                                   | -  Amazon Bedrock       |                      |\n|                                   | -  Fastembed            |                      |\n|                                   | -  Mistral              |                      |\n|                                   | -  VoyageAI             |                      |\n+-----------------------------------+-------------------------+----------------------+\n\n**Simple Schematic Implementation**\n\n.. image:: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Simple_Example.png\n   :target: https://raw.githubusercontent.com/KulikDM/muzlin/main/images/Simple_Example.png\n   :alt: Muzlin Pipeline\n\n----\n\n###########\n Resources\n###########\n\n**Example Notebooks**\n\n+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+\n| Notebook                                                                                                          | Description                                                                 |\n+===================================================================================================================+=============================================================================+\n| `Introduction <https://github.com/KulikDM/muzlin/blob/main/examples/00_Introduction.ipynb>`_                      | Basic semantic vector-based outlier detection                               |\n+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+\n| `Optimal Threshold <https://github.com/KulikDM/muzlin/blob/main/examples/01_Threshold_Optimization.ipynb>`_       | Selecting optimal thresholds using various methods                          |\n+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+\n| `Cluster-Based Filtering <https://github.com/KulikDM/muzlin/blob/main/examples/02_Cluster_Filtering.ipynb>`_      | Cluster-based filtering for question answering                              |\n+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+\n| `Graph-Based Filtering <https://github.com/KulikDM/muzlin/blob/main/examples/03_Graph_Filtering.ipynb>`_          | Using graph-based anomaly detection for semantic graphs like GraphRAG       |\n+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+\n\n############\n What Else?\n############\n\nLooking for more? Check out other useful libraries like `Semantic Router\n<https://github.com/aurelio-labs/semantic-router>`_, `CRAG\n<https://github.com/HuskyInSalt/CRAG>`_, and `Scikit-LLM\n<https://github.com/iryna-kondr/scikit-llm>`_\n\n----\n\n##############\n Contributing\n##############\n\n**Muzlin is still evolving!** At the moment their are major changes\nbeing done and the structure of Muzlin is still being refined. For now,\nplease leave a bug report and potential new code for any fixes or\nimprovements. You will be added as a co-author if it is implemented.\n\nOnce this phase has been completed then ->\n\nAnyone is welcome to contribute to Muzlin:\n\n-  Please share your ideas and ask questions by opening an issue.\n\n-  To contribute, first check the Issue list for the \"help wanted\" tag\n   and comment on the one that you are interested in. The issue will\n   then be assigned to you.\n\n-  If the bug, feature, or documentation change is novel (not in the\n   Issue list), you can either log a new issue or create a pull request\n   for the new changes.\n\n-  To start, fork the **dev branch** and add your\n   improvement/modification/fix.\n\n-  To make sure the code has the same style and standard, please refer\n   to detector.py for example.\n\n-  Create a pull request to the **dev branch** and follow the pull\n   request template `PR template\n   <https://github.com/KulikDM/muzlin/blob/main/.github/PULL_REQUEST_TEMPLATE.md>`_\n\n-  Please make sure that all code changes are accompanied with proper\n   new/updated test functions. Automatic tests will be triggered. Before\n   the pull request can be merged, make sure that all the tests pass.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Daniel Kulik  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Muzlin: a filtering toolset for semantic machine learning",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/KulikDM/muzlin",
        "Repository": "https://github.com/KulikDM/muzlin"
    },
    "split_keywords": [
        "rag",
        " outlier-detection",
        " filtering",
        " retrieval",
        " semantic",
        " ml",
        " llm",
        " nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "166e08da9deaa57239facf868aae4bb90b228a14891435096d1ec0c41522fb01",
                "md5": "42f185ce0550674d5e2cc3619da0fc54",
                "sha256": "3d0b94dfa54aad3ddeeaccb564b632561f7038ed7b767ec1ebcba56721f595c4"
            },
            "downloads": -1,
            "filename": "muzlin-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "42f185ce0550674d5e2cc3619da0fc54",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 38715,
            "upload_time": "2024-12-16T10:22:43",
            "upload_time_iso_8601": "2024-12-16T10:22:43.925104Z",
            "url": "https://files.pythonhosted.org/packages/16/6e/08da9deaa57239facf868aae4bb90b228a14891435096d1ec0c41522fb01/muzlin-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9827ff92c3049bc586c1e8f64fc9af322477f60248008d4886d1137b1c59a021",
                "md5": "cdffc03bac7e695a3fd476f2411d16bc",
                "sha256": "d064c8a6e09f57a777d3d3c56942d890b23c4408e69d6662510e57ca85dbc880"
            },
            "downloads": -1,
            "filename": "muzlin-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "cdffc03bac7e695a3fd476f2411d16bc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 32180,
            "upload_time": "2024-12-16T10:22:47",
            "upload_time_iso_8601": "2024-12-16T10:22:47.437316Z",
            "url": "https://files.pythonhosted.org/packages/98/27/ff92c3049bc586c1e8f64fc9af322477f60248008d4886d1137b1c59a021/muzlin-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-16 10:22:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "KulikDM",
    "github_project": "muzlin",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "muzlin"
}
        
Elapsed time: 0.40498s