ovos-solver-gguf-plugin

Name	ovos-solver-gguf-plugin JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/TigreGotico/ovos-solver-gguf-plugin
Summary	A question solver plugin for OVOS
upload_time	2025-01-28 23:07:34
maintainer	None
docs_url	None
author	jarbasai
requires_python	None
license	MIT
keywords	ovos openvoiceos plugin utterance fallback query
VCS
bugtrack_url
requirements	ovos-plugin-manager huggingface-hub llama-cpp-python
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # GGUF Solver

## Overview

`GGUFSolver` is a question-answering module that utilizes GGUF models to provide responses to user queries. This solver
streams utterances for real-time interaction and is built on the `ovos_plugin_manager.templates.solvers.QuestionSolver`
framework.

## Features

- Supports loading GGUF models from local files or remote repositories.
- Streams partial responses for improved interactivity.
- Configurable persona and verbosity settings.
- Capable of providing spoken answers.

## Configuration

`GGUFSolver` requires a configuration dictionary. The configuration should at least specify the model to use. Here is an
example configuration:

```python
cfg = {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf"
}
```

- `model`: The identifier for the model. It can be a local file path or a repository ID for a remote model.
- `n_gpu_layers`: how many layer to offload to GPU, `-1` to offload all. default `0`
- `remote_filename`: The specific filename to load from a remote repository.
- `chat_format`: (Optional) Chat formatting settings.
- `verbose`: (Optional) Set to `True` for detailed logging.
- `persona`: (Optional) Persona for the system messages. Default
  is `"You are a helpful assistant who gives short factual answers"`.
- `max_tokens`: (Optional) Maximum tokens for the response. Default is `512`.

**NOTE**: for GPU support llama.cpp needs to be compiled with

`CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir`

## Usage

### Initializing the Solver

```python
from ovos_gguf_solver import GGUFSolver
from ovos_utils.log import LOG

LOG.set_level("DEBUG")

cfg = {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf"
}

solver = GGUFSolver(cfg)
```

### Streaming Utterances

Use the `stream_utterances` method to stream responses. This is particularly useful for real-time applications such as
voice assistants.

```python
query = "tell me a joke about aliens"
for sentence in solver.stream_utterances(query):
    print(sentence)
```

### Getting a Full Answer

Use the `get_spoken_answer` method to get a complete response.

```python
query = "What is the capital of France?"
answer = solver.get_spoken_answer(query)
print(answer)
```

## Integrating with Persona Framework

To integrate `GGUFSolver` with the OVOS Persona Framework and pass solver configurations, follow these examples.

Each example demonstrates how to define a persona configuration file with specific settings for different models or configurations.

To use any of these configurations, run the OVOS Persona Server with the desired configuration file:

```bash
$ ovos-persona-server --persona gguf_persona_remote.json
```

Replace `gguf_persona_remote.json` with the filename of the configuration you wish to use.

### Example 1: Using a Remote GGUF Model

This example shows how to configure the `GGUFSolver` to use a remote GGUF model with a specific persona.

**`gguf_persona_remote.json`**:

```json
{
  "name": "Notus",
  "solvers": [
    "ovos-solver-gguf-plugin"
  ],
  "ovos-solver-gguf-plugin": {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf",
    "persona": "You are an advanced assistant providing detailed and accurate information.",
    "verbose": true
  }
}
```

In this configuration:
- `ovos-solver-gguf-plugin` is set to use a remote GGUF model `TheBloke/notus-7B-v1-GGUF` with the specified filename.
- The persona is configured to provide detailed and accurate information.
- `verbose` is set to `true` for detailed logging.

### Example 2: Using a Local GGUF Model

This example shows how to configure the `GGUFSolver` to use a local GGUF model.

**`gguf_persona_local.json`**:

```json
{
  "name": "LocalGGUFPersona",
  "solvers": [
    "ovos-solver-gguf-plugin"
  ],
  "ovos-solver-gguf-plugin": {
    "model": "/path/to/local/model/gguf_model.gguf",
    "persona": "You are a helpful assistant providing concise answers.",
    "max_tokens": 256
  }
}
```

In this configuration:
- `ovos-solver-gguf-plugin` is set to use a local GGUF model located at `/path/to/local/model/gguf_model.gguf`.
- The persona is configured to provide concise answers.
- `max_tokens` is set to `256` to limit the response length.


### Example Models

these models are not endorsed and this list was largely compiling by searching hugging face, only for illustrative
purposes

| Language   | Model Name                                          | URL                                                                                              | Description                                                                                                                                                                                                                                                                                                                                                                                             |
|------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| English    | CausalLM-14B-GGUF                                   | [Link](https://huggingface.co/TheBloke/CausalLM-14B-GGUF)                                        | A 14B parameter model compatible with Meta LLaMA 2, demonstrating top-tier performance among models with fewer than 70B parameters, optimized for both qualitative and quantitative evaluations, with strong consistency across versions.                                                                                                                                                               |
| English    | Phi-3-Mini-4K-Instruct-GGUF                         | [Link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf)                             | A lightweight 3.8B parameter model from the Phi-3 family, optimized for strong reasoning and long-context tasks with robust performance in instruction adherence and logical reasoning.                                                                                                                                                                                                                 |
| English    | Qwen2-0.5B-Instruct-GGUF                            | [Link](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GGUF)                                     | A 0.5B parameter instruction-tuned model from the Qwen2 series, excelling in language understanding, generation, and multilingual tasks with competitive performance against state-of-the-art models.                                                                                                                                                                                                   |
| English    | GritLM_-_GritLM-7B-gguf                             | [Link](https://huggingface.co/RichardErkhov/GritLM_-_GritLM-7B-gguf)                             | A unified model for both text generation and embedding tasks, achieving state-of-the-art performance in both areas and enhancing Retrieval-Augmented Generation (RAG) efficiency by over 60%.                                                                                                                                                                                                           |
| English    | falcon-7b-instruct-GGUF                             | [Link](https://huggingface.co/QuantFactory/falcon-7b-instruct-GGUF)                              | A 7B parameter instruct model based on Falcon-7B, optimized for chat and instruction tasks with performance benefits from extensive training on 1,500B tokens, and optimized inference architecture.                                                                                                                                                                                                    |
| English    | Samantha-Qwen-2-7B-GGUF                             | [Link](https://huggingface.co/QuantFactory/Samantha-Qwen-2-7B-GGUF)                              | A quantized 7B parameter model fine-tuned with QLoRa and FSDP, tailored for conversational tasks and utilizing datasets like OpenHermes-2.5 and Opus_Samantha.                                                                                                                                                                                                                                          |
| English    | Mistral-7B-Instruct-v0.3-GGUF                       | [Link](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF)                       | An instruct fine-tuned model based on Mistral-7B-v0.3, featuring an extended vocabulary and support for function calling, aimed at demonstrating effective fine-tuning with room for improved moderation mechanisms.                                                                                                                                                                                    |
| English    | Lite-Mistral-150M-v2-Instruct-GGUF                  | [Link](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct-GGUF)                         | A compact 150M parameter model optimized for efficiency on various devices, demonstrating reasonable performance in simple queries but facing challenges with context preservation and accuracy in multi-turn conversations.                                                                                                                                                                            |
| English    | TowerInstruct-7B-v0.1-GGUF                          | [Link](https://huggingface.co/TheBloke/TowerInstruct-7B-v0.1-GGUF)                               | A 7B parameter model fine-tuned on the TowerBlocks dataset for translation tasks, including general, context-aware, and terminology-aware translation, as well as named-entity recognition and grammatical error correction.                                                                                                                                                                            |
| English    | Dr_Samantha-7B-GGUF                                 | [Link](https://huggingface.co/TheBloke/Dr_Samantha-7B-GGUF)                                      | A merged model incorporating medical and psychological knowledge, with extensive performance on medical knowledge tasks and a focus on whole-person care.                                                                                                                                                                                                                                               |
| English    | phi-2-orange-GGUF                                   | [Link](https://huggingface.co/rhysjones/phi-2-orange)                                            | A finetuned model based on Phi-2, optimized with a two-step finetuning approach for improved performance in various evaluation metrics. The model is designed for Python-related tasks and general question answering.                                                                                                                                                                                  |
| English    | phi-2-electrical-engineering-GGUF                   | [Link](https://huggingface.co/TheBloke/phi-2-electrical-engineering-GGUF)                        | The phi-2-electrical-engineering model excels in answering questions and generating code specifically for electrical engineering and Kicad software, boasting efficient deployment and a focus on technical accuracy within its 2.7 billion parameters.                                                                                                                                                 |
| English    | Unholy-v2-13B-GGUF                                  | [Link](https://huggingface.co/TheBloke/Unholy-v2-13B-GGUF)                                       | An uncensored 13B parameter model merged with various models for an uncensored experience, designed to bypass typical content moderation filters.                                                                                                                                                                                                                                                       |
| English    | CapybaraHermes-2.5-Mistral-7B-GGUF                  | [Link](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF)                       | A preference-tuned 7B model using distilabel, optimized for multi-turn performance with improved scores in benchmarks like MTBench and Nous, compared to the Mistral-7B-Instruct-v0.2.                                                                                                                                                                                                                  |
| English    | notus-7B-v1-GGUF                                    | [Link](https://huggingface.co/TheBloke/notus-7B-v1-GGUF)                                         | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO), surpassing Zephyr-7B-beta and Claude 2 on AlpacaEval, designed for chat-like applications with improved preference-based performance.                                                                                                                                                                                        |
| English    | Luna AI Llama2 Uncensored GGML                      | [Link](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML)                           | A Llama2-based chat model fine-tuned on over 40,000 long-form chat discussions. Optimized with synthetic outputs, available in both 4-bit GPTQ for GPU and GGML for CPU inference. Prompt format follows Vicuna 1.1/OpenChat style.                                                                                                                                                                     |
| English    | Zephyr-7B-β-GGUF                                    | [Link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)                                      | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO) to enhance performance, optimized for helpfulness but may generate problematic text due to removed in-built alignment.                                                                                                                                                                                                        |
| English    | TinyLlama-1.1B-1T-OpenOrca-GGUF                     | [Link](https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF)                          | A 1.1B parameter model fine-tuned on the OpenOrca GPT-4 subset, optimized for conversational tasks with a focus on efficiency and performance in the CHATML format.                                                                                                                                                                                                                                     |
| English    | LlongOrca-7B-16K-GGUF                               | [Link](https://huggingface.co/TheBloke/LlongOrca-7B-16K-GGUF)                                    | A fine-tuned 7B parameter model optimized for long contexts, achieving top performance in long-context benchmarks and notable improvements over the base model, with efficient training using OpenChat's MultiPack algorithm.                                                                                                                                                                           |
| English    | Meta-Llama-3-8B-Instruct-GGUF                       | [Link](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF)                        | An 8B parameter instruction-tuned model from the Llama 3 series, optimized for dialogue and outperforming many open-source models on industry benchmarks, with a focus on helpfulness and safety through advanced fine-tuning techniques.                                                                                                                                                               |
| English    | Smol-7B-GGUF                                        | [Link](https://huggingface.co/TheBloke/smol-7B-GGUF)                                             | A fine-tuned 7B parameter model from the Smol series, known for its strong performance in diverse NLP tasks and efficient fine-tuning techniques.                                                                                                                                                                                                                                                       |
| English    | Smol-Llama-101M-Chat-v1-GGUF                        | [Link](https://huggingface.co/afrideva/Smol-Llama-101M-Chat-v1-GGUF)                             | A compact 101M parameter chat model optimized for diverse conversational tasks, showing balanced performance across multiple benchmarks with a focus on efficiency and low-resource scenarios.                                                                                                                                                                                                          |
| English    | Sonya-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/Sonya-7B-GGUF)                                            | A high-performing 7B model with excellent scores in MT-Bench, ideal for various tasks including assistant and roleplay, combining multiple sources to achieve superior performance.                                                                                                                                                                                                                     |
| English    | WizardLM-7B-uncensored-GGML                         | [Link](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML)                              | An uncensored 7B parameter model from the WizardLM series, designed without built-in alignment to allow for custom alignment via techniques like RLHF LoRA, with no guardrails and complete responsibility for content usage resting on the user.                                                                                                                                                       |
| English    | OpenChat 3.5                                        | [Link](https://huggingface.co/TheBloke/openchat_3.5-GGUF)                                        | A 7B parameter model that achieves comparable results with ChatGPT, excelling in MT-bench evaluations. It utilizes mixed-quality data and C-RLFT (a variant of offline reinforcement learning) for training. OpenChat 3.5 performs well across various benchmarks and has been optimized for high-throughput deployment. It is an open-source model with strong performance in chat-based applications. |
| Portuguese | PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf | [Link](https://huggingface.co/RichardErkhov/PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf) | Gervásio 7B PTPT is an open decoder for Portuguese, built on the LLaMA-2 7B model, fine-tuned with instruction data to excel in various Portuguese tasks, and designed to run on consumer-grade hardware with a focus on European Portuguese.                                                                                                                                                           |
| Portuguese | CabraLlama3-8b-GGUF                                 | [Link](https://huggingface.co/mradermacher/CabraLlama3-8b-GGUF)                                  | A refined version of Meta-Llama-3-8B-Instruct, optimized with the Cabra 30k dataset for understanding and responding in Portuguese, providing enhanced performance for Portuguese language tasks.                                                                                                                                                                                                       |
| Portuguese | bode-7b-alpaca-pt-br-gguf                           | [Link](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf)                             | Bode-7B is a fine-tuned LLaMA 2-based model designed for Portuguese, delivering satisfactory results in classification tasks and prompt-based applications.                                                                                                                                                                                                                                             |
| Portuguese | bode-13b-alpaca-pt-br-gguf                          | [Link](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br-gguf)                            | Bode-13B is a fine-tuned LLaMA 2-based model for Portuguese prompts, offering enhanced performance over its 7B counterpart, and designed for both research and commercial applications with a focus on Portuguese language tasks.                                                                                                                                                                       |
| Portuguese | sabia-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/sabia-7B-GGUF)                                            | Sabiá-7B is a Portuguese auto-regressive language model based on LLaMA-1-7B, pretrained on a large Portuguese dataset, offering high performance in few-shot tasks and generating text, with research-only licensing.                                                                                                                                                                                   |
| Portuguese | OpenHermesV2-PTBR-portuguese-brazil-gguf            | [Link](https://huggingface.co/skoll520/OpenHermesV2-PTBR-portuguese-brazil-gguf)                 | A finetuned version of Mistral 7B trained on diverse GPT-4 generated data, designed for Portuguese, with extensive filtering and transformation for enhanced performance.                                                                                                                                                                                                                               |
| Catalan    | CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF         | [Link](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF)             | An instruction-tuned model optimized with DPO for various NLP tasks in Catalan, including translation, NER, summarization, and sentiment analysis, built on an auto-regressive transformer architecture.                                                                                                                                                                                                |

> The models listed are suggestions. The best model for your use case will depend on your specific requirements such as
> language, task complexity, and performance needs.




## Credits

![image](https://github.com/user-attachments/assets/809588a2-32a2-406c-98c0-f88bf7753cb4)

> This work was sponsored by VisioLab, part of [Royal Dutch Visio](https://visio.org/), is the test, education, and research center in the field of (innovative) assistive technology for blind and visually impaired people and professionals. We explore (new) technological developments such as Voice, VR and AI and make the knowledge and expertise we gain available to everyone.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TigreGotico/ovos-solver-gguf-plugin",
    "name": "ovos-solver-gguf-plugin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "OVOS openvoiceos plugin utterance fallback query",
    "author": "jarbasai",
    "author_email": "jarbasai@mailfence.com",
    "download_url": "https://files.pythonhosted.org/packages/8e/65/0b191b3e4f37cbf524fd68625757ac0bd49a6cc48f0e5b79f34b848338ea/ovos-solver-gguf-plugin-0.1.0.tar.gz",
    "platform": null,
    "description": "# GGUF Solver\n\n## Overview\n\n`GGUFSolver` is a question-answering module that utilizes GGUF models to provide responses to user queries. This solver\nstreams utterances for real-time interaction and is built on the `ovos_plugin_manager.templates.solvers.QuestionSolver`\nframework.\n\n## Features\n\n- Supports loading GGUF models from local files or remote repositories.\n- Streams partial responses for improved interactivity.\n- Configurable persona and verbosity settings.\n- Capable of providing spoken answers.\n\n## Configuration\n\n`GGUFSolver` requires a configuration dictionary. The configuration should at least specify the model to use. Here is an\nexample configuration:\n\n```python\ncfg = {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\"\n}\n```\n\n- `model`: The identifier for the model. It can be a local file path or a repository ID for a remote model.\n- `n_gpu_layers`: how many layer to offload to GPU, `-1` to offload all. default `0`\n- `remote_filename`: The specific filename to load from a remote repository.\n- `chat_format`: (Optional) Chat formatting settings.\n- `verbose`: (Optional) Set to `True` for detailed logging.\n- `persona`: (Optional) Persona for the system messages. Default\n  is `\"You are a helpful assistant who gives short factual answers\"`.\n- `max_tokens`: (Optional) Maximum tokens for the response. Default is `512`.\n\n**NOTE**: for GPU support llama.cpp needs to be compiled with\n\n`CMAKE_ARGS=\"-DGGML_CUDA=on\" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir`\n\n## Usage\n\n### Initializing the Solver\n\n```python\nfrom ovos_gguf_solver import GGUFSolver\nfrom ovos_utils.log import LOG\n\nLOG.set_level(\"DEBUG\")\n\ncfg = {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\"\n}\n\nsolver = GGUFSolver(cfg)\n```\n\n### Streaming Utterances\n\nUse the `stream_utterances` method to stream responses. This is particularly useful for real-time applications such as\nvoice assistants.\n\n```python\nquery = \"tell me a joke about aliens\"\nfor sentence in solver.stream_utterances(query):\n    print(sentence)\n```\n\n### Getting a Full Answer\n\nUse the `get_spoken_answer` method to get a complete response.\n\n```python\nquery = \"What is the capital of France?\"\nanswer = solver.get_spoken_answer(query)\nprint(answer)\n```\n\n## Integrating with Persona Framework\n\nTo integrate `GGUFSolver` with the OVOS Persona Framework and pass solver configurations, follow these examples.\n\nEach example demonstrates how to define a persona configuration file with specific settings for different models or configurations.\n\nTo use any of these configurations, run the OVOS Persona Server with the desired configuration file:\n\n```bash\n$ ovos-persona-server --persona gguf_persona_remote.json\n```\n\nReplace `gguf_persona_remote.json` with the filename of the configuration you wish to use.\n\n### Example 1: Using a Remote GGUF Model\n\nThis example shows how to configure the `GGUFSolver` to use a remote GGUF model with a specific persona.\n\n**`gguf_persona_remote.json`**:\n\n```json\n{\n  \"name\": \"Notus\",\n  \"solvers\": [\n    \"ovos-solver-gguf-plugin\"\n  ],\n  \"ovos-solver-gguf-plugin\": {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\",\n    \"persona\": \"You are an advanced assistant providing detailed and accurate information.\",\n    \"verbose\": true\n  }\n}\n```\n\nIn this configuration:\n- `ovos-solver-gguf-plugin` is set to use a remote GGUF model `TheBloke/notus-7B-v1-GGUF` with the specified filename.\n- The persona is configured to provide detailed and accurate information.\n- `verbose` is set to `true` for detailed logging.\n\n### Example 2: Using a Local GGUF Model\n\nThis example shows how to configure the `GGUFSolver` to use a local GGUF model.\n\n**`gguf_persona_local.json`**:\n\n```json\n{\n  \"name\": \"LocalGGUFPersona\",\n  \"solvers\": [\n    \"ovos-solver-gguf-plugin\"\n  ],\n  \"ovos-solver-gguf-plugin\": {\n    \"model\": \"/path/to/local/model/gguf_model.gguf\",\n    \"persona\": \"You are a helpful assistant providing concise answers.\",\n    \"max_tokens\": 256\n  }\n}\n```\n\nIn this configuration:\n- `ovos-solver-gguf-plugin` is set to use a local GGUF model located at `/path/to/local/model/gguf_model.gguf`.\n- The persona is configured to provide concise answers.\n- `max_tokens` is set to `256` to limit the response length.\n\n\n### Example Models\n\nthese models are not endorsed and this list was largely compiling by searching hugging face, only for illustrative\npurposes\n\n| Language   | Model Name                                          | URL                                                                                              | Description                                                                                                                                                                                                                                                                                                                                                                                             |\n|------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| English    | CausalLM-14B-GGUF                                   | [Link](https://huggingface.co/TheBloke/CausalLM-14B-GGUF)                                        | A 14B parameter model compatible with Meta LLaMA 2, demonstrating top-tier performance among models with fewer than 70B parameters, optimized for both qualitative and quantitative evaluations, with strong consistency across versions.                                                                                                                                                               |\n| English    | Phi-3-Mini-4K-Instruct-GGUF                         | [Link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf)                             | A lightweight 3.8B parameter model from the Phi-3 family, optimized for strong reasoning and long-context tasks with robust performance in instruction adherence and logical reasoning.                                                                                                                                                                                                                 |\n| English    | Qwen2-0.5B-Instruct-GGUF                            | [Link](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GGUF)                                     | A 0.5B parameter instruction-tuned model from the Qwen2 series, excelling in language understanding, generation, and multilingual tasks with competitive performance against state-of-the-art models.                                                                                                                                                                                                   |\n| English    | GritLM_-_GritLM-7B-gguf                             | [Link](https://huggingface.co/RichardErkhov/GritLM_-_GritLM-7B-gguf)                             | A unified model for both text generation and embedding tasks, achieving state-of-the-art performance in both areas and enhancing Retrieval-Augmented Generation (RAG) efficiency by over 60%.                                                                                                                                                                                                           |\n| English    | falcon-7b-instruct-GGUF                             | [Link](https://huggingface.co/QuantFactory/falcon-7b-instruct-GGUF)                              | A 7B parameter instruct model based on Falcon-7B, optimized for chat and instruction tasks with performance benefits from extensive training on 1,500B tokens, and optimized inference architecture.                                                                                                                                                                                                    |\n| English    | Samantha-Qwen-2-7B-GGUF                             | [Link](https://huggingface.co/QuantFactory/Samantha-Qwen-2-7B-GGUF)                              | A quantized 7B parameter model fine-tuned with QLoRa and FSDP, tailored for conversational tasks and utilizing datasets like OpenHermes-2.5 and Opus_Samantha.                                                                                                                                                                                                                                          |\n| English    | Mistral-7B-Instruct-v0.3-GGUF                       | [Link](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF)                       | An instruct fine-tuned model based on Mistral-7B-v0.3, featuring an extended vocabulary and support for function calling, aimed at demonstrating effective fine-tuning with room for improved moderation mechanisms.                                                                                                                                                                                    |\n| English    | Lite-Mistral-150M-v2-Instruct-GGUF                  | [Link](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct-GGUF)                         | A compact 150M parameter model optimized for efficiency on various devices, demonstrating reasonable performance in simple queries but facing challenges with context preservation and accuracy in multi-turn conversations.                                                                                                                                                                            |\n| English    | TowerInstruct-7B-v0.1-GGUF                          | [Link](https://huggingface.co/TheBloke/TowerInstruct-7B-v0.1-GGUF)                               | A 7B parameter model fine-tuned on the TowerBlocks dataset for translation tasks, including general, context-aware, and terminology-aware translation, as well as named-entity recognition and grammatical error correction.                                                                                                                                                                            |\n| English    | Dr_Samantha-7B-GGUF                                 | [Link](https://huggingface.co/TheBloke/Dr_Samantha-7B-GGUF)                                      | A merged model incorporating medical and psychological knowledge, with extensive performance on medical knowledge tasks and a focus on whole-person care.                                                                                                                                                                                                                                               |\n| English    | phi-2-orange-GGUF                                   | [Link](https://huggingface.co/rhysjones/phi-2-orange)                                            | A finetuned model based on Phi-2, optimized with a two-step finetuning approach for improved performance in various evaluation metrics. The model is designed for Python-related tasks and general question answering.                                                                                                                                                                                  |\n| English    | phi-2-electrical-engineering-GGUF                   | [Link](https://huggingface.co/TheBloke/phi-2-electrical-engineering-GGUF)                        | The phi-2-electrical-engineering model excels in answering questions and generating code specifically for electrical engineering and Kicad software, boasting efficient deployment and a focus on technical accuracy within its 2.7 billion parameters.                                                                                                                                                 |\n| English    | Unholy-v2-13B-GGUF                                  | [Link](https://huggingface.co/TheBloke/Unholy-v2-13B-GGUF)                                       | An uncensored 13B parameter model merged with various models for an uncensored experience, designed to bypass typical content moderation filters.                                                                                                                                                                                                                                                       |\n| English    | CapybaraHermes-2.5-Mistral-7B-GGUF                  | [Link](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF)                       | A preference-tuned 7B model using distilabel, optimized for multi-turn performance with improved scores in benchmarks like MTBench and Nous, compared to the Mistral-7B-Instruct-v0.2.                                                                                                                                                                                                                  |\n| English    | notus-7B-v1-GGUF                                    | [Link](https://huggingface.co/TheBloke/notus-7B-v1-GGUF)                                         | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO), surpassing Zephyr-7B-beta and Claude 2 on AlpacaEval, designed for chat-like applications with improved preference-based performance.                                                                                                                                                                                        |\n| English    | Luna AI Llama2 Uncensored GGML                      | [Link](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML)                           | A Llama2-based chat model fine-tuned on over 40,000 long-form chat discussions. Optimized with synthetic outputs, available in both 4-bit GPTQ for GPU and GGML for CPU inference. Prompt format follows Vicuna 1.1/OpenChat style.                                                                                                                                                                     |\n| English    | Zephyr-7B-\u03b2-GGUF                                    | [Link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)                                      | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO) to enhance performance, optimized for helpfulness but may generate problematic text due to removed in-built alignment.                                                                                                                                                                                                        |\n| English    | TinyLlama-1.1B-1T-OpenOrca-GGUF                     | [Link](https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF)                          | A 1.1B parameter model fine-tuned on the OpenOrca GPT-4 subset, optimized for conversational tasks with a focus on efficiency and performance in the CHATML format.                                                                                                                                                                                                                                     |\n| English    | LlongOrca-7B-16K-GGUF                               | [Link](https://huggingface.co/TheBloke/LlongOrca-7B-16K-GGUF)                                    | A fine-tuned 7B parameter model optimized for long contexts, achieving top performance in long-context benchmarks and notable improvements over the base model, with efficient training using OpenChat's MultiPack algorithm.                                                                                                                                                                           |\n| English    | Meta-Llama-3-8B-Instruct-GGUF                       | [Link](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF)                        | An 8B parameter instruction-tuned model from the Llama 3 series, optimized for dialogue and outperforming many open-source models on industry benchmarks, with a focus on helpfulness and safety through advanced fine-tuning techniques.                                                                                                                                                               |\n| English    | Smol-7B-GGUF                                        | [Link](https://huggingface.co/TheBloke/smol-7B-GGUF)                                             | A fine-tuned 7B parameter model from the Smol series, known for its strong performance in diverse NLP tasks and efficient fine-tuning techniques.                                                                                                                                                                                                                                                       |\n| English    | Smol-Llama-101M-Chat-v1-GGUF                        | [Link](https://huggingface.co/afrideva/Smol-Llama-101M-Chat-v1-GGUF)                             | A compact 101M parameter chat model optimized for diverse conversational tasks, showing balanced performance across multiple benchmarks with a focus on efficiency and low-resource scenarios.                                                                                                                                                                                                          |\n| English    | Sonya-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/Sonya-7B-GGUF)                                            | A high-performing 7B model with excellent scores in MT-Bench, ideal for various tasks including assistant and roleplay, combining multiple sources to achieve superior performance.                                                                                                                                                                                                                     |\n| English    | WizardLM-7B-uncensored-GGML                         | [Link](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML)                              | An uncensored 7B parameter model from the WizardLM series, designed without built-in alignment to allow for custom alignment via techniques like RLHF LoRA, with no guardrails and complete responsibility for content usage resting on the user.                                                                                                                                                       |\n| English    | OpenChat 3.5                                        | [Link](https://huggingface.co/TheBloke/openchat_3.5-GGUF)                                        | A 7B parameter model that achieves comparable results with ChatGPT, excelling in MT-bench evaluations. It utilizes mixed-quality data and C-RLFT (a variant of offline reinforcement learning) for training. OpenChat 3.5 performs well across various benchmarks and has been optimized for high-throughput deployment. It is an open-source model with strong performance in chat-based applications. |\n| Portuguese | PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf | [Link](https://huggingface.co/RichardErkhov/PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf) | Gerv\u00e1sio 7B PTPT is an open decoder for Portuguese, built on the LLaMA-2 7B model, fine-tuned with instruction data to excel in various Portuguese tasks, and designed to run on consumer-grade hardware with a focus on European Portuguese.                                                                                                                                                           |\n| Portuguese | CabraLlama3-8b-GGUF                                 | [Link](https://huggingface.co/mradermacher/CabraLlama3-8b-GGUF)                                  | A refined version of Meta-Llama-3-8B-Instruct, optimized with the Cabra 30k dataset for understanding and responding in Portuguese, providing enhanced performance for Portuguese language tasks.                                                                                                                                                                                                       |\n| Portuguese | bode-7b-alpaca-pt-br-gguf                           | [Link](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf)                             | Bode-7B is a fine-tuned LLaMA 2-based model designed for Portuguese, delivering satisfactory results in classification tasks and prompt-based applications.                                                                                                                                                                                                                                             |\n| Portuguese | bode-13b-alpaca-pt-br-gguf                          | [Link](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br-gguf)                            | Bode-13B is a fine-tuned LLaMA 2-based model for Portuguese prompts, offering enhanced performance over its 7B counterpart, and designed for both research and commercial applications with a focus on Portuguese language tasks.                                                                                                                                                                       |\n| Portuguese | sabia-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/sabia-7B-GGUF)                                            | Sabi\u00e1-7B is a Portuguese auto-regressive language model based on LLaMA-1-7B, pretrained on a large Portuguese dataset, offering high performance in few-shot tasks and generating text, with research-only licensing.                                                                                                                                                                                   |\n| Portuguese | OpenHermesV2-PTBR-portuguese-brazil-gguf            | [Link](https://huggingface.co/skoll520/OpenHermesV2-PTBR-portuguese-brazil-gguf)                 | A finetuned version of Mistral 7B trained on diverse GPT-4 generated data, designed for Portuguese, with extensive filtering and transformation for enhanced performance.                                                                                                                                                                                                                               |\n| Catalan    | CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF         | [Link](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF)             | An instruction-tuned model optimized with DPO for various NLP tasks in Catalan, including translation, NER, summarization, and sentiment analysis, built on an auto-regressive transformer architecture.                                                                                                                                                                                                |\n\n> The models listed are suggestions. The best model for your use case will depend on your specific requirements such as\n> language, task complexity, and performance needs.\n\n\n\n\n## Credits\n\n![image](https://github.com/user-attachments/assets/809588a2-32a2-406c-98c0-f88bf7753cb4)\n\n> This work was sponsored by VisioLab, part of [Royal Dutch Visio](https://visio.org/), is the test, education, and research center in the field of (innovative) assistive technology for blind and visually impaired people and professionals. We explore (new) technological developments such as Voice, VR and AI and make the knowledge and expertise we gain available to everyone.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A question solver plugin for OVOS",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/TigreGotico/ovos-solver-gguf-plugin"
    },
    "split_keywords": [
        "ovos",
        "openvoiceos",
        "plugin",
        "utterance",
        "fallback",
        "query"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c0bafabde80232c7912bb99daa364986fcf4f1be1168a09f99349bc352eb51b7",
                "md5": "741a63e8924f35337371e5758ab54488",
                "sha256": "e6f1c398d323c153a5451d8c41a03159be2d114442f810f088eabbd834a4a32e"
            },
            "downloads": -1,
            "filename": "ovos_solver_gguf_plugin-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "741a63e8924f35337371e5758ab54488",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10486,
            "upload_time": "2025-01-28T23:07:33",
            "upload_time_iso_8601": "2025-01-28T23:07:33.134678Z",
            "url": "https://files.pythonhosted.org/packages/c0/ba/fabde80232c7912bb99daa364986fcf4f1be1168a09f99349bc352eb51b7/ovos_solver_gguf_plugin-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8e650b191b3e4f37cbf524fd68625757ac0bd49a6cc48f0e5b79f34b848338ea",
                "md5": "6bc3db1a18af2e6543fd9fd1c00428da",
                "sha256": "e545b024d4d4db108dafa5f0248c326e808e6aee67513ab388d8699fc5d0807a"
            },
            "downloads": -1,
            "filename": "ovos-solver-gguf-plugin-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6bc3db1a18af2e6543fd9fd1c00428da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15216,
            "upload_time": "2025-01-28T23:07:34",
            "upload_time_iso_8601": "2025-01-28T23:07:34.773427Z",
            "url": "https://files.pythonhosted.org/packages/8e/65/0b191b3e4f37cbf524fd68625757ac0bd49a6cc48f0e5b79f34b848338ea/ovos-solver-gguf-plugin-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-28 23:07:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TigreGotico",
    "github_project": "ovos-solver-gguf-plugin",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ovos-plugin-manager",
            "specs": []
        },
        {
            "name": "huggingface-hub",
            "specs": []
        },
        {
            "name": "llama-cpp-python",
            "specs": []
        }
    ],
    "lcname": "ovos-solver-gguf-plugin"
}

jarbasai