| Name | SciPhi JSON |
| Version |
0.1.0
JSON |
| download |
| home_page | |
| Summary | SciPhi: A framework for synthetic data. |
| upload_time | 2023-10-23 15:44:36 |
| maintainer | |
| docs_url | None |
| author | Owen Colegrove |
| requires_python | >=3.09,<3.12 |
| license | Apache-2.0 |
| keywords |
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# SciPhi [<span style="color:gold">ΨΦ</span>]: A Framework for LLM Powered Data
<p align="center">
<img width="716" alt="SciPhi Logo" src="https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982">
</p>
SciPhi is a Python-based framework designed to facilitate the generation of high-quality synthetic data tailored for both Large Language Models (LLMs) and human users. This suite offers:
- **Configurable Data Generation:** Craft datasets mediated by LLMs according to your specifications.
- **Retriever-Augmented Generation (RAG) Integration:** Make use of an integrated RAG Provider API. Also, it comes bundled with an evaluation harness to ground your generated data to real-world datasets.
- **Textbook Generation Module:** A module to power the generation of RAG-augmented synthetic textbooks straight from a given table of contents.
---
## Fast Setup
Install SciPhi via `pip`:
### Base Installation:
```bash
pip install sciphi
```
### Optional Dependencies:
Install with specific optional support using extras:
- **Anthropic**: `'sciphi[anthropic_support]'`
- **HF (includes Torch)**: `'sciphi[hf_support]'`
- **Llama-CPP**: `'sciphi[llama_cpp_support]'`
- **Llama-Index**: `'sciphi[llama_index_support]'`
- **VLLM (includes Torch)**: `'sciphi[vllm_support]'`
### Recommended (All Optional Dependencies):
```bash
pip install 'sciphi[all_with_extras]'
```
Note: Depending on your shell, you might need to use quotes around the package name and extras to avoid globbing.
---
- Join our [Discord community](https://discord.gg/j9GxfbxqAe) for discussions and collaboration.
- For specialized inquiries, [email us](mailto:owen@sciphi.ai).
## Features
### Community & Support
- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).
- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).
### Textbook Generation (The Library of Phi)
This is an effort to democratize access to top-tier textbooks. By leveraging cutting-edge AI techniques, we aim to produce factual and high-quality educational materials. This can readily be extended to other domains, such as internal commercial documents.
#### Generating Textbooks
1. **Dry Run**:
```bash
python sciphi/scripts/generate_textbook.py dry_run
```
2. **Default Textbook Generation**:
```bash
python sciphi/scripts/generate_textbook.py run --textbook=Aerodynamics_of_Viscous_Fluids --rag-enabled=False --filter_existing_books=False --log-level=debug
```
You may use the setting rag-enabled to toggle on/off RAG augmentation of the textbook. You may customize the RAG provider through additional arguments.
See a [sample output here.](sciphi/data/library_of_phi/sample/Aerodynamics_of_Viscous_Fluids.md)
3. **Crafting with a Custom Table of Contents**:
Prepare your table of contents and save it as `textbook_name.yaml`. Then, move this file to the recommended directory.
4. **Activating RAG Functionality**:
Simply switch `rag-enabled` to `True`. Ensure you have the right `.env` variables set up, or provide CLI values for `rag_api_base` and `rag_api_key`.
_Important:_ To make the most out of grounding your data with Wikipedia, ensure your system matches our detailed specifications. We offer additional examples and resources [here](https://github.com/emrgnt-cmplxty/library_of_phi/tree/main).
### RAG Eval Harness
To measure the efficacy of your RAG pipeline, we provide a unique RAG evaluation harness.
#### Deploying the RAG Harness
1. **Initiate the Harness**:
```bash
poetry run python sciphi/scripts/rag_harness.py --n-samples=100 --rag-enabled=True --evals_to_run="science_multiple_choice"
```
---
## Local Development
1. **Clone the Repository**:
Begin by cloning the repository and stepping into the project directory:
```bash
git clone https://github.com/emrgnt-cmplxty/sciphi.git
cd sciphi
```
2. **Install the Dependencies**:
Start by installing the primary requirements:
```bash
pip install -r requirements.txt
```
If you require further functionalities, consider the following:
- For the developer's toolkit and utilities:
```bash
pip install -r requirements-dev.txt
```
- To encompass all optional dependencies:
```bash
pip install -r requirements_all.txt
```
Alternatively, to manage packages using Poetry:
```bash
poetry install
```
And for optional dependencies w/ Poetry:
```bash
poetry install -E [all, all_with_extras]
```
3. **Setting Up Your Environment**:
Begin by duplicating the sample environment file to craft your own:
```bash
cp .env.example .env
```
Next, use a text editor to adjust the `.env` file with your specific configurations. An example with `vim` is shown below:
```bash
vim .env
```
After entering your settings, ensure you save and exit the file.
---
## System Requirements
### Essential Packages:
- **Python Version**: `>=3.9,<3.12`
- **Required Libraries**:
- `bs4`: `^0.0.1`
- `fire`: `^0.5.0`
- `openai`: `0.27.8`
- `pandas`: `^2.1.0`
- `python-dotenv`: `^1.0.0`
- `pyyaml`: `^6.0.1`
- `retrying`: `^1.3.4`
- `sentencepiece`: `^0.1.99`
- `torch`: `^2.1.0`
- `tiktoken`: `^0.5.1`
- `tqdm`: `^4.66.1`
### Supplementary Packages:
- **Anthropic Integration**:
- `anthropic`: `^0.3.10`
- **Hugging Face Tools**:
- `accelerate`: `^0.23.0`
- `datasets`: `^2.14.5`
- `transformers`: `^4.33.1`
- **Llama-Index**:
- `llama-index`: `^0.8.29.post1`
- **Llama-CPP**:
- `llama-cpp-python`: `^0.2.11`
- **VLLM Tools**:
- `vllm`: `0.2.0`
---
## Licensing and Acknowledgment
This project is licensed under the [Apache-2.0 License](./LICENSE).
### Citing Our Work
If SciPhi plays a role in your research, we kindly ask you to acknowledge us with the following citation:
```plaintext
@software{SciPhi,
author = {Colegrove, Owen},
doi = {Pending},
month = {09},
title = {{SciPhi: A Framework for LLM Powered Data}},
url = {https://github.com/sciphi-ai/sciphi},
year = {2023}
}
```
Raw data
{
"_id": null,
"home_page": "",
"name": "SciPhi",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.09,<3.12",
"maintainer_email": "",
"keywords": "",
"author": "Owen Colegrove",
"author_email": "owen@emergentagi.com",
"download_url": "https://files.pythonhosted.org/packages/67/3a/3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6/sciphi-0.1.0.tar.gz",
"platform": null,
"description": "# SciPhi [<span style=\"color:gold\">\u03a8\u03a6</span>]: A Framework for LLM Powered Data\n\n<p align=\"center\">\n<img width=\"716\" alt=\"SciPhi Logo\" src=\"https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982\">\n</p>\n\nSciPhi is a Python-based framework designed to facilitate the generation of high-quality synthetic data tailored for both Large Language Models (LLMs) and human users. This suite offers:\n\n- **Configurable Data Generation:** Craft datasets mediated by LLMs according to your specifications.\n- **Retriever-Augmented Generation (RAG) Integration:** Make use of an integrated RAG Provider API. Also, it comes bundled with an evaluation harness to ground your generated data to real-world datasets.\n- **Textbook Generation Module:** A module to power the generation of RAG-augmented synthetic textbooks straight from a given table of contents.\n\n\n---\n\n## Fast Setup\n\nInstall SciPhi via `pip`:\n\n### Base Installation:\n\n```bash\npip install sciphi\n```\n\n\n### Optional Dependencies:\n\nInstall with specific optional support using extras:\n\n- **Anthropic**: `'sciphi[anthropic_support]'`\n- **HF (includes Torch)**: `'sciphi[hf_support]'`\n- **Llama-CPP**: `'sciphi[llama_cpp_support]'`\n- **Llama-Index**: `'sciphi[llama_index_support]'`\n- **VLLM (includes Torch)**: `'sciphi[vllm_support]'`\n\n### Recommended (All Optional Dependencies):\n\n```bash\npip install 'sciphi[all_with_extras]'\n```\n\nNote: Depending on your shell, you might need to use quotes around the package name and extras to avoid globbing.\n\n---\n\n- Join our [Discord community](https://discord.gg/j9GxfbxqAe) for discussions and collaboration.\n- For specialized inquiries, [email us](mailto:owen@sciphi.ai).\n\n## Features\n\n### Community & Support\n\n- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).\n- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).\n\n### Textbook Generation (The Library of Phi)\n\nThis is an effort to democratize access to top-tier textbooks. By leveraging cutting-edge AI techniques, we aim to produce factual and high-quality educational materials. This can readily be extended to other domains, such as internal commercial documents.\n\n#### Generating Textbooks\n\n1. **Dry Run**:\n ```bash\n python sciphi/scripts/generate_textbook.py dry_run\n ```\n\n2. **Default Textbook Generation**:\n ```bash\n python sciphi/scripts/generate_textbook.py run --textbook=Aerodynamics_of_Viscous_Fluids --rag-enabled=False --filter_existing_books=False --log-level=debug\n ```\n \n You may use the setting rag-enabled to toggle on/off RAG augmentation of the textbook. You may customize the RAG provider through additional arguments.\n\n See a [sample output here.](sciphi/data/library_of_phi/sample/Aerodynamics_of_Viscous_Fluids.md)\n\n3. **Crafting with a Custom Table of Contents**: \n\n Prepare your table of contents and save it as `textbook_name.yaml`. Then, move this file to the recommended directory.\n\n4. **Activating RAG Functionality**: \n\n Simply switch `rag-enabled` to `True`. Ensure you have the right `.env` variables set up, or provide CLI values for `rag_api_base` and `rag_api_key`.\n\n_Important:_ To make the most out of grounding your data with Wikipedia, ensure your system matches our detailed specifications. We offer additional examples and resources [here](https://github.com/emrgnt-cmplxty/library_of_phi/tree/main).\n\n### RAG Eval Harness\n\nTo measure the efficacy of your RAG pipeline, we provide a unique RAG evaluation harness.\n\n#### Deploying the RAG Harness\n\n1. **Initiate the Harness**:\n ```bash\n poetry run python sciphi/scripts/rag_harness.py --n-samples=100 --rag-enabled=True --evals_to_run=\"science_multiple_choice\"\n ```\n\n---\n\n## Local Development\n\n1. **Clone the Repository**:\n \n Begin by cloning the repository and stepping into the project directory:\n ```bash\n git clone https://github.com/emrgnt-cmplxty/sciphi.git\n cd sciphi\n ```\n\n2. **Install the Dependencies**:\n\n Start by installing the primary requirements:\n ```bash\n pip install -r requirements.txt\n ```\n\n If you require further functionalities, consider the following:\n - For the developer's toolkit and utilities:\n ```bash\n pip install -r requirements-dev.txt\n ```\n\n - To encompass all optional dependencies:\n ```bash\n pip install -r requirements_all.txt\n ```\n\n Alternatively, to manage packages using Poetry:\n ```bash\n poetry install\n ```\n\n And for optional dependencies w/ Poetry:\n ```bash\n poetry install -E [all, all_with_extras]\n ```\n\n3. **Setting Up Your Environment**:\n\n Begin by duplicating the sample environment file to craft your own:\n ```bash\n cp .env.example .env\n ```\n\n Next, use a text editor to adjust the `.env` file with your specific configurations. An example with `vim` is shown below:\n ```bash\n vim .env\n ```\n\n After entering your settings, ensure you save and exit the file.\n\n---\n\n## System Requirements\n\n### Essential Packages:\n\n- **Python Version**: `>=3.9,<3.12`\n\n- **Required Libraries**:\n - `bs4`: `^0.0.1`\n - `fire`: `^0.5.0`\n - `openai`: `0.27.8`\n - `pandas`: `^2.1.0`\n - `python-dotenv`: `^1.0.0`\n - `pyyaml`: `^6.0.1`\n - `retrying`: `^1.3.4`\n - `sentencepiece`: `^0.1.99`\n - `torch`: `^2.1.0`\n - `tiktoken`: `^0.5.1`\n - `tqdm`: `^4.66.1`\n\n### Supplementary Packages:\n\n- **Anthropic Integration**:\n - `anthropic`: `^0.3.10`\n- **Hugging Face Tools**:\n - `accelerate`: `^0.23.0`\n - `datasets`: `^2.14.5`\n - `transformers`: `^4.33.1`\n- **Llama-Index**:\n - `llama-index`: `^0.8.29.post1`\n- **Llama-CPP**:\n - `llama-cpp-python`: `^0.2.11`\n- **VLLM Tools**:\n - `vllm`: `0.2.0`\n\n---\n\n## Licensing and Acknowledgment\n\nThis project is licensed under the [Apache-2.0 License](./LICENSE).\n\n### Citing Our Work\n\nIf SciPhi plays a role in your research, we kindly ask you to acknowledge us with the following citation:\n\n```plaintext\n@software{SciPhi,\nauthor = {Colegrove, Owen},\ndoi = {Pending},\nmonth = {09},\ntitle = {{SciPhi: A Framework for LLM Powered Data}},\nurl = {https://github.com/sciphi-ai/sciphi},\nyear = {2023}\n}\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "SciPhi: A framework for synthetic data.",
"version": "0.1.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fc5e7b9e16ed1317dd79e668cd7fcdcd16b44f8efd446b00c57385359fe795de",
"md5": "3f8bd65b93997970e0e35cd435e8ad6a",
"sha256": "85e35dea4adb9100481b8a6f1103fa64a4accc92ba454e781e68f3b7683ceec8"
},
"downloads": -1,
"filename": "sciphi-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3f8bd65b93997970e0e35cd435e8ad6a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.09,<3.12",
"size": 475853,
"upload_time": "2023-10-23T15:44:34",
"upload_time_iso_8601": "2023-10-23T15:44:34.159461Z",
"url": "https://files.pythonhosted.org/packages/fc/5e/7b9e16ed1317dd79e668cd7fcdcd16b44f8efd446b00c57385359fe795de/sciphi-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "673a3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6",
"md5": "ef18e6c9c027c219d22227c89657a0cb",
"sha256": "1149e60994eeebef76717c6a8099b90becf1e8bd2bc96b6e9a65113c5dc86e80"
},
"downloads": -1,
"filename": "sciphi-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "ef18e6c9c027c219d22227c89657a0cb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.09,<3.12",
"size": 444828,
"upload_time": "2023-10-23T15:44:36",
"upload_time_iso_8601": "2023-10-23T15:44:36.120774Z",
"url": "https://files.pythonhosted.org/packages/67/3a/3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6/sciphi-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-23 15:44:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sciphi"
}