sciphi-synthesizer


Namesciphi-synthesizer JSON
Version 1.0.5 PyPI version JSON
download
home_page
SummarySynthesizer: A Framework for LLM Powered Data.
upload_time2024-01-03 20:54:40
maintainer
docs_urlNone
authorOwen Colegrove
requires_python>=3.9,<3.12
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Synthesizer[ΨΦ]: A multi-purpose LLM framework 💡

<p align="center">
<img width="716" alt="SciPhi Logo" src="https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982">
</p>

With Synthesizer, users can:

- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs.
   - Anthropic, OpenAI, vLLM, and HuggingFace.
- **Retrieval-Augmented Generation (RAG) on Demand**: Built-in RAG Provider Interface to anchor generated data to real-world sources. 
   - Turnkey integration with Agent Search API. 
- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs, for LLM training, RAG, and more.

---

## Fast Setup

```bash
pip install sciphi-synthesizer
```

### Using Synthesizer

1. **Generate synthetic question-answer pairs**

   ```bash
   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
   python -m synthesizer.scripts.data_augmenter run --dataset="wiki_qa"
   ```

   ```bash
   tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl
   { "formatted_prompt": "... ### Question:\nwhat country did wine originate in\n\n### Input:\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\nTitle:History of wine....",
   { "completion": "Wine originated in the South Caucasus, which is now part of modern-day Armenia ..."
   ```

2. **Evaluate RAG pipeline performance**

   ```bash
   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
   python -m synthesizer.scripts.rag_harness --rag_provider="agent-search" --llm_provider_name="sciphi" --n_samples=25
   ```

### Documentation

For more detailed information, tutorials, and API references, please visit the official [Synthesizer Documentation](https://sciphi.readthedocs.io/en/latest/).

### Community & Support

- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).
- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).

### Developing with Synthesizer

Quickly set up RAG augmented generation with your choice of provider, from OpenAI, Anhtropic, vLLM, and SciPhi:

```python
# Requires SCIPHI_API_KEY in env

from synthesizer.core import LLMProviderName, RAGProviderName
from synthesizer.interface import LLMInterfaceManager, RAGInterfaceManager
from synthesizer.llm import GenerationConfig

# RAG Provider Settings
rag_interface = RAGInterfaceManager.get_interface_from_args(
    RAGProviderName("agent-search"),
    limit_hierarchical_url_results=rag_limit_hierarchical_url_results,
    limit_final_pagerank_results=rag_limit_final_pagerank_results,
)
rag_context = rag_interface.get_rag_context(query)

# LLM Provider Settings
llm_interface = LLMInterfaceManager.get_interface_from_args(
    LLMProviderName("openai"),
)

generation_config = GenerationConfig(
    model_name=llm_model_name,
    max_tokens_to_sample=llm_max_tokens_to_sample,
    temperature=llm_temperature,
    top_p=llm_top_p,
    # other generation params here ...
)

formatted_prompt = raw_prompt.format(rag_context=rag_context)
completion = llm_interface.get_completion(formatted_prompt, generation_config)
```
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sciphi-synthesizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.12",
    "maintainer_email": "",
    "keywords": "",
    "author": "Owen Colegrove",
    "author_email": "owen@sciphi.ai",
    "download_url": "https://files.pythonhosted.org/packages/46/50/08020d06f4453916627e2a9e62e873cadbb7f21455a0ce97d22c8de7813c/sciphi_synthesizer-1.0.5.tar.gz",
    "platform": null,
    "description": "# Synthesizer[\u03a8\u03a6]: A multi-purpose LLM framework \ud83d\udca1\n\n<p align=\"center\">\n<img width=\"716\" alt=\"SciPhi Logo\" src=\"https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982\">\n</p>\n\nWith Synthesizer, users can:\n\n- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs.\n   - Anthropic, OpenAI, vLLM, and HuggingFace.\n- **Retrieval-Augmented Generation (RAG) on Demand**: Built-in RAG Provider Interface to anchor generated data to real-world sources. \n   - Turnkey integration with Agent Search API. \n- **Custom Data Creation**: Generate datasets via LLMs that are tailored to your needs, for LLM training, RAG, and more.\n\n---\n\n## Fast Setup\n\n```bash\npip install sciphi-synthesizer\n```\n\n### Using Synthesizer\n\n1. **Generate synthetic question-answer pairs**\n\n   ```bash\n   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY\n   python -m synthesizer.scripts.data_augmenter run --dataset=\"wiki_qa\"\n   ```\n\n   ```bash\n   tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl\n   { \"formatted_prompt\": \"... ### Question:\\nwhat country did wine originate in\\n\\n### Input:\\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\\nTitle:History of wine....\",\n   { \"completion\": \"Wine originated in the South Caucasus, which is now part of modern-day Armenia ...\"\n   ```\n\n2. **Evaluate RAG pipeline performance**\n\n   ```bash\n   export SCIPHI_API_KEY=MY_SCIPHI_API_KEY\n   python -m synthesizer.scripts.rag_harness --rag_provider=\"agent-search\" --llm_provider_name=\"sciphi\" --n_samples=25\n   ```\n\n### Documentation\n\nFor more detailed information, tutorials, and API references, please visit the official [Synthesizer Documentation](https://sciphi.readthedocs.io/en/latest/).\n\n### Community & Support\n\n- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).\n- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).\n\n### Developing with Synthesizer\n\nQuickly set up RAG augmented generation with your choice of provider, from OpenAI, Anhtropic, vLLM, and SciPhi:\n\n```python\n# Requires SCIPHI_API_KEY in env\n\nfrom synthesizer.core import LLMProviderName, RAGProviderName\nfrom synthesizer.interface import LLMInterfaceManager, RAGInterfaceManager\nfrom synthesizer.llm import GenerationConfig\n\n# RAG Provider Settings\nrag_interface = RAGInterfaceManager.get_interface_from_args(\n    RAGProviderName(\"agent-search\"),\n    limit_hierarchical_url_results=rag_limit_hierarchical_url_results,\n    limit_final_pagerank_results=rag_limit_final_pagerank_results,\n)\nrag_context = rag_interface.get_rag_context(query)\n\n# LLM Provider Settings\nllm_interface = LLMInterfaceManager.get_interface_from_args(\n    LLMProviderName(\"openai\"),\n)\n\ngeneration_config = GenerationConfig(\n    model_name=llm_model_name,\n    max_tokens_to_sample=llm_max_tokens_to_sample,\n    temperature=llm_temperature,\n    top_p=llm_top_p,\n    # other generation params here ...\n)\n\nformatted_prompt = raw_prompt.format(rag_context=rag_context)\ncompletion = llm_interface.get_completion(formatted_prompt, generation_config)\n```",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Synthesizer: A Framework for LLM Powered Data.",
    "version": "1.0.5",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a1ce5e9562d6b76911186f1555b21a439ced43c2a5c18c9e5693146d807ef3fa",
                "md5": "794b6fe1f8d6c64ec9f0951cd59c8b9f",
                "sha256": "be1a736e8e7c7ed41f2ec2d639d1f59be46dad871e52b8e28c37798419a2aa5a"
            },
            "downloads": -1,
            "filename": "sciphi_synthesizer-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "794b6fe1f8d6c64ec9f0951cd59c8b9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.12",
            "size": 164030,
            "upload_time": "2024-01-03T20:54:38",
            "upload_time_iso_8601": "2024-01-03T20:54:38.198027Z",
            "url": "https://files.pythonhosted.org/packages/a1/ce/5e9562d6b76911186f1555b21a439ced43c2a5c18c9e5693146d807ef3fa/sciphi_synthesizer-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "465008020d06f4453916627e2a9e62e873cadbb7f21455a0ce97d22c8de7813c",
                "md5": "668df494a0e06d321549236a21ac49c0",
                "sha256": "c8f5db3e4b222edb501d7096579bac6d2659fdbc180bbdd7b6e726eb6eedd6c5"
            },
            "downloads": -1,
            "filename": "sciphi_synthesizer-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "668df494a0e06d321549236a21ac49c0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.12",
            "size": 144472,
            "upload_time": "2024-01-03T20:54:40",
            "upload_time_iso_8601": "2024-01-03T20:54:40.378431Z",
            "url": "https://files.pythonhosted.org/packages/46/50/08020d06f4453916627e2a9e62e873cadbb7f21455a0ce97d22c8de7813c/sciphi_synthesizer-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-03 20:54:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "sciphi-synthesizer"
}
        
Elapsed time: 0.17051s