ovos-solver-gguf-plugin


Nameovos-solver-gguf-plugin JSON
Version 0.0.0a4 PyPI version JSON
download
home_pagehttps://github.com/TigreGotico/ovos-solver-gguf-plugin
SummaryA question solver plugin for OVOS
upload_time2024-10-25 20:34:17
maintainerNone
docs_urlNone
authorjarbasai
requires_pythonNone
licenseMIT
keywords ovos openvoiceos plugin utterance fallback query
VCS
bugtrack_url
requirements ovos-plugin-manager huggingface-hub llama-cpp-python
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GGUF Solver

## Overview

`GGUFSolver` is a question-answering module that utilizes GGUF models to provide responses to user queries. This solver
streams utterances for real-time interaction and is built on the `ovos_plugin_manager.templates.solvers.QuestionSolver`
framework.

## Features

- Supports loading GGUF models from local files or remote repositories.
- Streams partial responses for improved interactivity.
- Configurable persona and verbosity settings.
- Capable of providing spoken answers.

## Configuration

`GGUFSolver` requires a configuration dictionary. The configuration should at least specify the model to use. Here is an
example configuration:

```python
cfg = {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf"
}
```

- `model`: The identifier for the model. It can be a local file path or a repository ID for a remote model.
- `n_gpu_layers`: how many layer to offload to GPU, `-1` to offload all. default `0`
- `remote_filename`: The specific filename to load from a remote repository.
- `chat_format`: (Optional) Chat formatting settings.
- `verbose`: (Optional) Set to `True` for detailed logging.
- `persona`: (Optional) Persona for the system messages. Default
  is `"You are a helpful assistant who gives short factual answers"`.
- `max_tokens`: (Optional) Maximum tokens for the response. Default is `512`.

**NOTE**: for GPU support llama.cpp needs to be compiled with

`CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir`

## Usage

### Initializing the Solver

```python
from ovos_gguf_solver import GGUFSolver
from ovos_utils.log import LOG

LOG.set_level("DEBUG")

cfg = {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf"
}

solver = GGUFSolver(cfg)
```

### Streaming Utterances

Use the `stream_utterances` method to stream responses. This is particularly useful for real-time applications such as
voice assistants.

```python
query = "tell me a joke about aliens"
for sentence in solver.stream_utterances(query):
    print(sentence)
```

### Getting a Full Answer

Use the `get_spoken_answer` method to get a complete response.

```python
query = "What is the capital of France?"
answer = solver.get_spoken_answer(query)
print(answer)
```

## Integrating with Persona Framework

To integrate `GGUFSolver` with the OVOS Persona Framework and pass solver configurations, follow these examples.

Each example demonstrates how to define a persona configuration file with specific settings for different models or configurations.

To use any of these configurations, run the OVOS Persona Server with the desired configuration file:

```bash
$ ovos-persona-server --persona gguf_persona_remote.json
```

Replace `gguf_persona_remote.json` with the filename of the configuration you wish to use.

### Example 1: Using a Remote GGUF Model

This example shows how to configure the `GGUFSolver` to use a remote GGUF model with a specific persona.

**`gguf_persona_remote.json`**:

```json
{
  "name": "Notus",
  "solvers": [
    "ovos-solver-gguf-plugin"
  ],
  "ovos-solver-gguf-plugin": {
    "model": "TheBloke/notus-7B-v1-GGUF",
    "remote_filename": "*Q4_K_M.gguf",
    "persona": "You are an advanced assistant providing detailed and accurate information.",
    "verbose": true
  }
}
```

In this configuration:
- `ovos-solver-gguf-plugin` is set to use a remote GGUF model `TheBloke/notus-7B-v1-GGUF` with the specified filename.
- The persona is configured to provide detailed and accurate information.
- `verbose` is set to `true` for detailed logging.

### Example 2: Using a Local GGUF Model

This example shows how to configure the `GGUFSolver` to use a local GGUF model.

**`gguf_persona_local.json`**:

```json
{
  "name": "LocalGGUFPersona",
  "solvers": [
    "ovos-solver-gguf-plugin"
  ],
  "ovos-solver-gguf-plugin": {
    "model": "/path/to/local/model/gguf_model.gguf",
    "persona": "You are a helpful assistant providing concise answers.",
    "max_tokens": 256
  }
}
```

In this configuration:
- `ovos-solver-gguf-plugin` is set to use a local GGUF model located at `/path/to/local/model/gguf_model.gguf`.
- The persona is configured to provide concise answers.
- `max_tokens` is set to `256` to limit the response length.


### Example Models

these models are not endorsed and this list was largely compiling by searching hugging face, only for illustrative
purposes

| Language   | Model Name                                          | URL                                                                                              | Description                                                                                                                                                                                                                                                                                                                                                                                             |
|------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| English    | CausalLM-14B-GGUF                                   | [Link](https://huggingface.co/TheBloke/CausalLM-14B-GGUF)                                        | A 14B parameter model compatible with Meta LLaMA 2, demonstrating top-tier performance among models with fewer than 70B parameters, optimized for both qualitative and quantitative evaluations, with strong consistency across versions.                                                                                                                                                               |
| English    | Phi-3-Mini-4K-Instruct-GGUF                         | [Link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf)                             | A lightweight 3.8B parameter model from the Phi-3 family, optimized for strong reasoning and long-context tasks with robust performance in instruction adherence and logical reasoning.                                                                                                                                                                                                                 |
| English    | Qwen2-0.5B-Instruct-GGUF                            | [Link](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GGUF)                                     | A 0.5B parameter instruction-tuned model from the Qwen2 series, excelling in language understanding, generation, and multilingual tasks with competitive performance against state-of-the-art models.                                                                                                                                                                                                   |
| English    | GritLM_-_GritLM-7B-gguf                             | [Link](https://huggingface.co/RichardErkhov/GritLM_-_GritLM-7B-gguf)                             | A unified model for both text generation and embedding tasks, achieving state-of-the-art performance in both areas and enhancing Retrieval-Augmented Generation (RAG) efficiency by over 60%.                                                                                                                                                                                                           |
| English    | falcon-7b-instruct-GGUF                             | [Link](https://huggingface.co/QuantFactory/falcon-7b-instruct-GGUF)                              | A 7B parameter instruct model based on Falcon-7B, optimized for chat and instruction tasks with performance benefits from extensive training on 1,500B tokens, and optimized inference architecture.                                                                                                                                                                                                    |
| English    | Samantha-Qwen-2-7B-GGUF                             | [Link](https://huggingface.co/QuantFactory/Samantha-Qwen-2-7B-GGUF)                              | A quantized 7B parameter model fine-tuned with QLoRa and FSDP, tailored for conversational tasks and utilizing datasets like OpenHermes-2.5 and Opus_Samantha.                                                                                                                                                                                                                                          |
| English    | Mistral-7B-Instruct-v0.3-GGUF                       | [Link](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF)                       | An instruct fine-tuned model based on Mistral-7B-v0.3, featuring an extended vocabulary and support for function calling, aimed at demonstrating effective fine-tuning with room for improved moderation mechanisms.                                                                                                                                                                                    |
| English    | Lite-Mistral-150M-v2-Instruct-GGUF                  | [Link](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct-GGUF)                         | A compact 150M parameter model optimized for efficiency on various devices, demonstrating reasonable performance in simple queries but facing challenges with context preservation and accuracy in multi-turn conversations.                                                                                                                                                                            |
| English    | TowerInstruct-7B-v0.1-GGUF                          | [Link](https://huggingface.co/TheBloke/TowerInstruct-7B-v0.1-GGUF)                               | A 7B parameter model fine-tuned on the TowerBlocks dataset for translation tasks, including general, context-aware, and terminology-aware translation, as well as named-entity recognition and grammatical error correction.                                                                                                                                                                            |
| English    | Dr_Samantha-7B-GGUF                                 | [Link](https://huggingface.co/TheBloke/Dr_Samantha-7B-GGUF)                                      | A merged model incorporating medical and psychological knowledge, with extensive performance on medical knowledge tasks and a focus on whole-person care.                                                                                                                                                                                                                                               |
| English    | phi-2-orange-GGUF                                   | [Link](https://huggingface.co/rhysjones/phi-2-orange)                                            | A finetuned model based on Phi-2, optimized with a two-step finetuning approach for improved performance in various evaluation metrics. The model is designed for Python-related tasks and general question answering.                                                                                                                                                                                  |
| English    | phi-2-electrical-engineering-GGUF                   | [Link](https://huggingface.co/TheBloke/phi-2-electrical-engineering-GGUF)                        | The phi-2-electrical-engineering model excels in answering questions and generating code specifically for electrical engineering and Kicad software, boasting efficient deployment and a focus on technical accuracy within its 2.7 billion parameters.                                                                                                                                                 |
| English    | Unholy-v2-13B-GGUF                                  | [Link](https://huggingface.co/TheBloke/Unholy-v2-13B-GGUF)                                       | An uncensored 13B parameter model merged with various models for an uncensored experience, designed to bypass typical content moderation filters.                                                                                                                                                                                                                                                       |
| English    | CapybaraHermes-2.5-Mistral-7B-GGUF                  | [Link](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF)                       | A preference-tuned 7B model using distilabel, optimized for multi-turn performance with improved scores in benchmarks like MTBench and Nous, compared to the Mistral-7B-Instruct-v0.2.                                                                                                                                                                                                                  |
| English    | notus-7B-v1-GGUF                                    | [Link](https://huggingface.co/TheBloke/notus-7B-v1-GGUF)                                         | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO), surpassing Zephyr-7B-beta and Claude 2 on AlpacaEval, designed for chat-like applications with improved preference-based performance.                                                                                                                                                                                        |
| English    | Luna AI Llama2 Uncensored GGML                      | [Link](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML)                           | A Llama2-based chat model fine-tuned on over 40,000 long-form chat discussions. Optimized with synthetic outputs, available in both 4-bit GPTQ for GPU and GGML for CPU inference. Prompt format follows Vicuna 1.1/OpenChat style.                                                                                                                                                                     |
| English    | Zephyr-7B-β-GGUF                                    | [Link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)                                      | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO) to enhance performance, optimized for helpfulness but may generate problematic text due to removed in-built alignment.                                                                                                                                                                                                        |
| English    | TinyLlama-1.1B-1T-OpenOrca-GGUF                     | [Link](https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF)                          | A 1.1B parameter model fine-tuned on the OpenOrca GPT-4 subset, optimized for conversational tasks with a focus on efficiency and performance in the CHATML format.                                                                                                                                                                                                                                     |
| English    | LlongOrca-7B-16K-GGUF                               | [Link](https://huggingface.co/TheBloke/LlongOrca-7B-16K-GGUF)                                    | A fine-tuned 7B parameter model optimized for long contexts, achieving top performance in long-context benchmarks and notable improvements over the base model, with efficient training using OpenChat's MultiPack algorithm.                                                                                                                                                                           |
| English    | Meta-Llama-3-8B-Instruct-GGUF                       | [Link](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF)                        | An 8B parameter instruction-tuned model from the Llama 3 series, optimized for dialogue and outperforming many open-source models on industry benchmarks, with a focus on helpfulness and safety through advanced fine-tuning techniques.                                                                                                                                                               |
| English    | Smol-7B-GGUF                                        | [Link](https://huggingface.co/TheBloke/smol-7B-GGUF)                                             | A fine-tuned 7B parameter model from the Smol series, known for its strong performance in diverse NLP tasks and efficient fine-tuning techniques.                                                                                                                                                                                                                                                       |
| English    | Smol-Llama-101M-Chat-v1-GGUF                        | [Link](https://huggingface.co/afrideva/Smol-Llama-101M-Chat-v1-GGUF)                             | A compact 101M parameter chat model optimized for diverse conversational tasks, showing balanced performance across multiple benchmarks with a focus on efficiency and low-resource scenarios.                                                                                                                                                                                                          |
| English    | Sonya-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/Sonya-7B-GGUF)                                            | A high-performing 7B model with excellent scores in MT-Bench, ideal for various tasks including assistant and roleplay, combining multiple sources to achieve superior performance.                                                                                                                                                                                                                     |
| English    | WizardLM-7B-uncensored-GGML                         | [Link](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML)                              | An uncensored 7B parameter model from the WizardLM series, designed without built-in alignment to allow for custom alignment via techniques like RLHF LoRA, with no guardrails and complete responsibility for content usage resting on the user.                                                                                                                                                       |
| English    | OpenChat 3.5                                        | [Link](https://huggingface.co/TheBloke/openchat_3.5-GGUF)                                        | A 7B parameter model that achieves comparable results with ChatGPT, excelling in MT-bench evaluations. It utilizes mixed-quality data and C-RLFT (a variant of offline reinforcement learning) for training. OpenChat 3.5 performs well across various benchmarks and has been optimized for high-throughput deployment. It is an open-source model with strong performance in chat-based applications. |
| Portuguese | PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf | [Link](https://huggingface.co/RichardErkhov/PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf) | Gervásio 7B PTPT is an open decoder for Portuguese, built on the LLaMA-2 7B model, fine-tuned with instruction data to excel in various Portuguese tasks, and designed to run on consumer-grade hardware with a focus on European Portuguese.                                                                                                                                                           |
| Portuguese | CabraLlama3-8b-GGUF                                 | [Link](https://huggingface.co/mradermacher/CabraLlama3-8b-GGUF)                                  | A refined version of Meta-Llama-3-8B-Instruct, optimized with the Cabra 30k dataset for understanding and responding in Portuguese, providing enhanced performance for Portuguese language tasks.                                                                                                                                                                                                       |
| Portuguese | bode-7b-alpaca-pt-br-gguf                           | [Link](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf)                             | Bode-7B is a fine-tuned LLaMA 2-based model designed for Portuguese, delivering satisfactory results in classification tasks and prompt-based applications.                                                                                                                                                                                                                                             |
| Portuguese | bode-13b-alpaca-pt-br-gguf                          | [Link](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br-gguf)                            | Bode-13B is a fine-tuned LLaMA 2-based model for Portuguese prompts, offering enhanced performance over its 7B counterpart, and designed for both research and commercial applications with a focus on Portuguese language tasks.                                                                                                                                                                       |
| Portuguese | sabia-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/sabia-7B-GGUF)                                            | Sabiá-7B is a Portuguese auto-regressive language model based on LLaMA-1-7B, pretrained on a large Portuguese dataset, offering high performance in few-shot tasks and generating text, with research-only licensing.                                                                                                                                                                                   |
| Portuguese | OpenHermesV2-PTBR-portuguese-brazil-gguf            | [Link](https://huggingface.co/skoll520/OpenHermesV2-PTBR-portuguese-brazil-gguf)                 | A finetuned version of Mistral 7B trained on diverse GPT-4 generated data, designed for Portuguese, with extensive filtering and transformation for enhanced performance.                                                                                                                                                                                                                               |
| Catalan    | CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF         | [Link](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF)             | An instruction-tuned model optimized with DPO for various NLP tasks in Catalan, including translation, NER, summarization, and sentiment analysis, built on an auto-regressive transformer architecture.                                                                                                                                                                                                |

> The models listed are suggestions. The best model for your use case will depend on your specific requirements such as
> language, task complexity, and performance needs.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TigreGotico/ovos-solver-gguf-plugin",
    "name": "ovos-solver-gguf-plugin",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "OVOS openvoiceos plugin utterance fallback query",
    "author": "jarbasai",
    "author_email": "jarbasai@mailfence.com",
    "download_url": null,
    "platform": null,
    "description": "# GGUF Solver\n\n## Overview\n\n`GGUFSolver` is a question-answering module that utilizes GGUF models to provide responses to user queries. This solver\nstreams utterances for real-time interaction and is built on the `ovos_plugin_manager.templates.solvers.QuestionSolver`\nframework.\n\n## Features\n\n- Supports loading GGUF models from local files or remote repositories.\n- Streams partial responses for improved interactivity.\n- Configurable persona and verbosity settings.\n- Capable of providing spoken answers.\n\n## Configuration\n\n`GGUFSolver` requires a configuration dictionary. The configuration should at least specify the model to use. Here is an\nexample configuration:\n\n```python\ncfg = {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\"\n}\n```\n\n- `model`: The identifier for the model. It can be a local file path or a repository ID for a remote model.\n- `n_gpu_layers`: how many layer to offload to GPU, `-1` to offload all. default `0`\n- `remote_filename`: The specific filename to load from a remote repository.\n- `chat_format`: (Optional) Chat formatting settings.\n- `verbose`: (Optional) Set to `True` for detailed logging.\n- `persona`: (Optional) Persona for the system messages. Default\n  is `\"You are a helpful assistant who gives short factual answers\"`.\n- `max_tokens`: (Optional) Maximum tokens for the response. Default is `512`.\n\n**NOTE**: for GPU support llama.cpp needs to be compiled with\n\n`CMAKE_ARGS=\"-DGGML_CUDA=on\" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir`\n\n## Usage\n\n### Initializing the Solver\n\n```python\nfrom ovos_gguf_solver import GGUFSolver\nfrom ovos_utils.log import LOG\n\nLOG.set_level(\"DEBUG\")\n\ncfg = {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\"\n}\n\nsolver = GGUFSolver(cfg)\n```\n\n### Streaming Utterances\n\nUse the `stream_utterances` method to stream responses. This is particularly useful for real-time applications such as\nvoice assistants.\n\n```python\nquery = \"tell me a joke about aliens\"\nfor sentence in solver.stream_utterances(query):\n    print(sentence)\n```\n\n### Getting a Full Answer\n\nUse the `get_spoken_answer` method to get a complete response.\n\n```python\nquery = \"What is the capital of France?\"\nanswer = solver.get_spoken_answer(query)\nprint(answer)\n```\n\n## Integrating with Persona Framework\n\nTo integrate `GGUFSolver` with the OVOS Persona Framework and pass solver configurations, follow these examples.\n\nEach example demonstrates how to define a persona configuration file with specific settings for different models or configurations.\n\nTo use any of these configurations, run the OVOS Persona Server with the desired configuration file:\n\n```bash\n$ ovos-persona-server --persona gguf_persona_remote.json\n```\n\nReplace `gguf_persona_remote.json` with the filename of the configuration you wish to use.\n\n### Example 1: Using a Remote GGUF Model\n\nThis example shows how to configure the `GGUFSolver` to use a remote GGUF model with a specific persona.\n\n**`gguf_persona_remote.json`**:\n\n```json\n{\n  \"name\": \"Notus\",\n  \"solvers\": [\n    \"ovos-solver-gguf-plugin\"\n  ],\n  \"ovos-solver-gguf-plugin\": {\n    \"model\": \"TheBloke/notus-7B-v1-GGUF\",\n    \"remote_filename\": \"*Q4_K_M.gguf\",\n    \"persona\": \"You are an advanced assistant providing detailed and accurate information.\",\n    \"verbose\": true\n  }\n}\n```\n\nIn this configuration:\n- `ovos-solver-gguf-plugin` is set to use a remote GGUF model `TheBloke/notus-7B-v1-GGUF` with the specified filename.\n- The persona is configured to provide detailed and accurate information.\n- `verbose` is set to `true` for detailed logging.\n\n### Example 2: Using a Local GGUF Model\n\nThis example shows how to configure the `GGUFSolver` to use a local GGUF model.\n\n**`gguf_persona_local.json`**:\n\n```json\n{\n  \"name\": \"LocalGGUFPersona\",\n  \"solvers\": [\n    \"ovos-solver-gguf-plugin\"\n  ],\n  \"ovos-solver-gguf-plugin\": {\n    \"model\": \"/path/to/local/model/gguf_model.gguf\",\n    \"persona\": \"You are a helpful assistant providing concise answers.\",\n    \"max_tokens\": 256\n  }\n}\n```\n\nIn this configuration:\n- `ovos-solver-gguf-plugin` is set to use a local GGUF model located at `/path/to/local/model/gguf_model.gguf`.\n- The persona is configured to provide concise answers.\n- `max_tokens` is set to `256` to limit the response length.\n\n\n### Example Models\n\nthese models are not endorsed and this list was largely compiling by searching hugging face, only for illustrative\npurposes\n\n| Language   | Model Name                                          | URL                                                                                              | Description                                                                                                                                                                                                                                                                                                                                                                                             |\n|------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| English    | CausalLM-14B-GGUF                                   | [Link](https://huggingface.co/TheBloke/CausalLM-14B-GGUF)                                        | A 14B parameter model compatible with Meta LLaMA 2, demonstrating top-tier performance among models with fewer than 70B parameters, optimized for both qualitative and quantitative evaluations, with strong consistency across versions.                                                                                                                                                               |\n| English    | Phi-3-Mini-4K-Instruct-GGUF                         | [Link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf)                             | A lightweight 3.8B parameter model from the Phi-3 family, optimized for strong reasoning and long-context tasks with robust performance in instruction adherence and logical reasoning.                                                                                                                                                                                                                 |\n| English    | Qwen2-0.5B-Instruct-GGUF                            | [Link](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GGUF)                                     | A 0.5B parameter instruction-tuned model from the Qwen2 series, excelling in language understanding, generation, and multilingual tasks with competitive performance against state-of-the-art models.                                                                                                                                                                                                   |\n| English    | GritLM_-_GritLM-7B-gguf                             | [Link](https://huggingface.co/RichardErkhov/GritLM_-_GritLM-7B-gguf)                             | A unified model for both text generation and embedding tasks, achieving state-of-the-art performance in both areas and enhancing Retrieval-Augmented Generation (RAG) efficiency by over 60%.                                                                                                                                                                                                           |\n| English    | falcon-7b-instruct-GGUF                             | [Link](https://huggingface.co/QuantFactory/falcon-7b-instruct-GGUF)                              | A 7B parameter instruct model based on Falcon-7B, optimized for chat and instruction tasks with performance benefits from extensive training on 1,500B tokens, and optimized inference architecture.                                                                                                                                                                                                    |\n| English    | Samantha-Qwen-2-7B-GGUF                             | [Link](https://huggingface.co/QuantFactory/Samantha-Qwen-2-7B-GGUF)                              | A quantized 7B parameter model fine-tuned with QLoRa and FSDP, tailored for conversational tasks and utilizing datasets like OpenHermes-2.5 and Opus_Samantha.                                                                                                                                                                                                                                          |\n| English    | Mistral-7B-Instruct-v0.3-GGUF                       | [Link](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF)                       | An instruct fine-tuned model based on Mistral-7B-v0.3, featuring an extended vocabulary and support for function calling, aimed at demonstrating effective fine-tuning with room for improved moderation mechanisms.                                                                                                                                                                                    |\n| English    | Lite-Mistral-150M-v2-Instruct-GGUF                  | [Link](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct-GGUF)                         | A compact 150M parameter model optimized for efficiency on various devices, demonstrating reasonable performance in simple queries but facing challenges with context preservation and accuracy in multi-turn conversations.                                                                                                                                                                            |\n| English    | TowerInstruct-7B-v0.1-GGUF                          | [Link](https://huggingface.co/TheBloke/TowerInstruct-7B-v0.1-GGUF)                               | A 7B parameter model fine-tuned on the TowerBlocks dataset for translation tasks, including general, context-aware, and terminology-aware translation, as well as named-entity recognition and grammatical error correction.                                                                                                                                                                            |\n| English    | Dr_Samantha-7B-GGUF                                 | [Link](https://huggingface.co/TheBloke/Dr_Samantha-7B-GGUF)                                      | A merged model incorporating medical and psychological knowledge, with extensive performance on medical knowledge tasks and a focus on whole-person care.                                                                                                                                                                                                                                               |\n| English    | phi-2-orange-GGUF                                   | [Link](https://huggingface.co/rhysjones/phi-2-orange)                                            | A finetuned model based on Phi-2, optimized with a two-step finetuning approach for improved performance in various evaluation metrics. The model is designed for Python-related tasks and general question answering.                                                                                                                                                                                  |\n| English    | phi-2-electrical-engineering-GGUF                   | [Link](https://huggingface.co/TheBloke/phi-2-electrical-engineering-GGUF)                        | The phi-2-electrical-engineering model excels in answering questions and generating code specifically for electrical engineering and Kicad software, boasting efficient deployment and a focus on technical accuracy within its 2.7 billion parameters.                                                                                                                                                 |\n| English    | Unholy-v2-13B-GGUF                                  | [Link](https://huggingface.co/TheBloke/Unholy-v2-13B-GGUF)                                       | An uncensored 13B parameter model merged with various models for an uncensored experience, designed to bypass typical content moderation filters.                                                                                                                                                                                                                                                       |\n| English    | CapybaraHermes-2.5-Mistral-7B-GGUF                  | [Link](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF)                       | A preference-tuned 7B model using distilabel, optimized for multi-turn performance with improved scores in benchmarks like MTBench and Nous, compared to the Mistral-7B-Instruct-v0.2.                                                                                                                                                                                                                  |\n| English    | notus-7B-v1-GGUF                                    | [Link](https://huggingface.co/TheBloke/notus-7B-v1-GGUF)                                         | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO), surpassing Zephyr-7B-beta and Claude 2 on AlpacaEval, designed for chat-like applications with improved preference-based performance.                                                                                                                                                                                        |\n| English    | Luna AI Llama2 Uncensored GGML                      | [Link](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML)                           | A Llama2-based chat model fine-tuned on over 40,000 long-form chat discussions. Optimized with synthetic outputs, available in both 4-bit GPTQ for GPU and GGML for CPU inference. Prompt format follows Vicuna 1.1/OpenChat style.                                                                                                                                                                     |\n| English    | Zephyr-7B-\u03b2-GGUF                                    | [Link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)                                      | A 7B parameter model fine-tuned with Direct Preference Optimization (DPO) to enhance performance, optimized for helpfulness but may generate problematic text due to removed in-built alignment.                                                                                                                                                                                                        |\n| English    | TinyLlama-1.1B-1T-OpenOrca-GGUF                     | [Link](https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF)                          | A 1.1B parameter model fine-tuned on the OpenOrca GPT-4 subset, optimized for conversational tasks with a focus on efficiency and performance in the CHATML format.                                                                                                                                                                                                                                     |\n| English    | LlongOrca-7B-16K-GGUF                               | [Link](https://huggingface.co/TheBloke/LlongOrca-7B-16K-GGUF)                                    | A fine-tuned 7B parameter model optimized for long contexts, achieving top performance in long-context benchmarks and notable improvements over the base model, with efficient training using OpenChat's MultiPack algorithm.                                                                                                                                                                           |\n| English    | Meta-Llama-3-8B-Instruct-GGUF                       | [Link](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF)                        | An 8B parameter instruction-tuned model from the Llama 3 series, optimized for dialogue and outperforming many open-source models on industry benchmarks, with a focus on helpfulness and safety through advanced fine-tuning techniques.                                                                                                                                                               |\n| English    | Smol-7B-GGUF                                        | [Link](https://huggingface.co/TheBloke/smol-7B-GGUF)                                             | A fine-tuned 7B parameter model from the Smol series, known for its strong performance in diverse NLP tasks and efficient fine-tuning techniques.                                                                                                                                                                                                                                                       |\n| English    | Smol-Llama-101M-Chat-v1-GGUF                        | [Link](https://huggingface.co/afrideva/Smol-Llama-101M-Chat-v1-GGUF)                             | A compact 101M parameter chat model optimized for diverse conversational tasks, showing balanced performance across multiple benchmarks with a focus on efficiency and low-resource scenarios.                                                                                                                                                                                                          |\n| English    | Sonya-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/Sonya-7B-GGUF)                                            | A high-performing 7B model with excellent scores in MT-Bench, ideal for various tasks including assistant and roleplay, combining multiple sources to achieve superior performance.                                                                                                                                                                                                                     |\n| English    | WizardLM-7B-uncensored-GGML                         | [Link](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML)                              | An uncensored 7B parameter model from the WizardLM series, designed without built-in alignment to allow for custom alignment via techniques like RLHF LoRA, with no guardrails and complete responsibility for content usage resting on the user.                                                                                                                                                       |\n| English    | OpenChat 3.5                                        | [Link](https://huggingface.co/TheBloke/openchat_3.5-GGUF)                                        | A 7B parameter model that achieves comparable results with ChatGPT, excelling in MT-bench evaluations. It utilizes mixed-quality data and C-RLFT (a variant of offline reinforcement learning) for training. OpenChat 3.5 performs well across various benchmarks and has been optimized for high-throughput deployment. It is an open-source model with strong performance in chat-based applications. |\n| Portuguese | PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf | [Link](https://huggingface.co/RichardErkhov/PORTULAN_-_gervasio-7b-portuguese-ptpt-decoder-gguf) | Gerv\u00e1sio 7B PTPT is an open decoder for Portuguese, built on the LLaMA-2 7B model, fine-tuned with instruction data to excel in various Portuguese tasks, and designed to run on consumer-grade hardware with a focus on European Portuguese.                                                                                                                                                           |\n| Portuguese | CabraLlama3-8b-GGUF                                 | [Link](https://huggingface.co/mradermacher/CabraLlama3-8b-GGUF)                                  | A refined version of Meta-Llama-3-8B-Instruct, optimized with the Cabra 30k dataset for understanding and responding in Portuguese, providing enhanced performance for Portuguese language tasks.                                                                                                                                                                                                       |\n| Portuguese | bode-7b-alpaca-pt-br-gguf                           | [Link](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf)                             | Bode-7B is a fine-tuned LLaMA 2-based model designed for Portuguese, delivering satisfactory results in classification tasks and prompt-based applications.                                                                                                                                                                                                                                             |\n| Portuguese | bode-13b-alpaca-pt-br-gguf                          | [Link](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br-gguf)                            | Bode-13B is a fine-tuned LLaMA 2-based model for Portuguese prompts, offering enhanced performance over its 7B counterpart, and designed for both research and commercial applications with a focus on Portuguese language tasks.                                                                                                                                                                       |\n| Portuguese | sabia-7B-GGUF                                       | [Link](https://huggingface.co/TheBloke/sabia-7B-GGUF)                                            | Sabi\u00e1-7B is a Portuguese auto-regressive language model based on LLaMA-1-7B, pretrained on a large Portuguese dataset, offering high performance in few-shot tasks and generating text, with research-only licensing.                                                                                                                                                                                   |\n| Portuguese | OpenHermesV2-PTBR-portuguese-brazil-gguf            | [Link](https://huggingface.co/skoll520/OpenHermesV2-PTBR-portuguese-brazil-gguf)                 | A finetuned version of Mistral 7B trained on diverse GPT-4 generated data, designed for Portuguese, with extensive filtering and transformation for enhanced performance.                                                                                                                                                                                                                               |\n| Catalan    | CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF         | [Link](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF)             | An instruction-tuned model optimized with DPO for various NLP tasks in Catalan, including translation, NER, summarization, and sentiment analysis, built on an auto-regressive transformer architecture.                                                                                                                                                                                                |\n\n> The models listed are suggestions. The best model for your use case will depend on your specific requirements such as\n> language, task complexity, and performance needs.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A question solver plugin for OVOS",
    "version": "0.0.0a4",
    "project_urls": {
        "Homepage": "https://github.com/TigreGotico/ovos-solver-gguf-plugin"
    },
    "split_keywords": [
        "ovos",
        "openvoiceos",
        "plugin",
        "utterance",
        "fallback",
        "query"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "383bd3cc86b7de41df181f689e492f79c2047f4673fcde20eaa69409697f3154",
                "md5": "2191afe0f86e72ff3f02ee939a40b058",
                "sha256": "f8e25236cde1a1392b58e61d21a5142dd6f545ec36d9338b2293036e7799d3f7"
            },
            "downloads": -1,
            "filename": "ovos_solver_gguf_plugin-0.0.0a4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2191afe0f86e72ff3f02ee939a40b058",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10225,
            "upload_time": "2024-10-25T20:34:17",
            "upload_time_iso_8601": "2024-10-25T20:34:17.885010Z",
            "url": "https://files.pythonhosted.org/packages/38/3b/d3cc86b7de41df181f689e492f79c2047f4673fcde20eaa69409697f3154/ovos_solver_gguf_plugin-0.0.0a4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 20:34:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TigreGotico",
    "github_project": "ovos-solver-gguf-plugin",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ovos-plugin-manager",
            "specs": []
        },
        {
            "name": "huggingface-hub",
            "specs": []
        },
        {
            "name": "llama-cpp-python",
            "specs": []
        }
    ],
    "lcname": "ovos-solver-gguf-plugin"
}
        
Elapsed time: 0.38904s