SciPhi


NameSciPhi JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummarySciPhi: A framework for synthetic data.
upload_time2023-10-23 15:44:36
maintainer
docs_urlNone
authorOwen Colegrove
requires_python>=3.09,<3.12
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SciPhi [<span style="color:gold">ΨΦ</span>]: A Framework for LLM Powered Data

<p align="center">
<img width="716" alt="SciPhi Logo" src="https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982">
</p>

SciPhi is a Python-based framework designed to facilitate the generation of high-quality synthetic data tailored for both Large Language Models (LLMs) and human users. This suite offers:

- **Configurable Data Generation:** Craft datasets mediated by LLMs according to your specifications.
- **Retriever-Augmented Generation (RAG) Integration:** Make use of an integrated RAG Provider API. Also, it comes bundled with an evaluation harness to ground your generated data to real-world datasets.
- **Textbook Generation Module:** A module to power the generation of RAG-augmented synthetic textbooks straight from a given table of contents.


---

## Fast Setup

Install SciPhi via `pip`:

### Base Installation:

```bash
pip install sciphi
```


### Optional Dependencies:

Install with specific optional support using extras:

- **Anthropic**: `'sciphi[anthropic_support]'`
- **HF (includes Torch)**: `'sciphi[hf_support]'`
- **Llama-CPP**: `'sciphi[llama_cpp_support]'`
- **Llama-Index**: `'sciphi[llama_index_support]'`
- **VLLM (includes Torch)**: `'sciphi[vllm_support]'`

### Recommended (All Optional Dependencies):

```bash
pip install 'sciphi[all_with_extras]'
```

Note: Depending on your shell, you might need to use quotes around the package name and extras to avoid globbing.

---

- Join our [Discord community](https://discord.gg/j9GxfbxqAe) for discussions and collaboration.
- For specialized inquiries, [email us](mailto:owen@sciphi.ai).

## Features

### Community & Support

- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).
- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).

### Textbook Generation (The Library of Phi)

This is an effort to democratize access to top-tier textbooks. By leveraging cutting-edge AI techniques, we aim to produce factual and high-quality educational materials. This can readily be extended to other domains, such as internal commercial documents.

#### Generating Textbooks

1. **Dry Run**:
   ```bash
   python sciphi/scripts/generate_textbook.py dry_run
   ```

2. **Default Textbook Generation**:
   ```bash
   python sciphi/scripts/generate_textbook.py run --textbook=Aerodynamics_of_Viscous_Fluids --rag-enabled=False --filter_existing_books=False --log-level=debug
   ```
   
   You may use the setting rag-enabled to toggle on/off RAG augmentation of the textbook. You may customize the RAG provider through additional arguments.

   See a [sample output here.](sciphi/data/library_of_phi/sample/Aerodynamics_of_Viscous_Fluids.md)

3. **Crafting with a Custom Table of Contents**: 

   Prepare your table of contents and save it as `textbook_name.yaml`. Then, move this file to the recommended directory.

4. **Activating RAG Functionality**: 

   Simply switch `rag-enabled` to `True`. Ensure you have the right `.env` variables set up, or provide CLI values for `rag_api_base` and `rag_api_key`.

_Important:_ To make the most out of grounding your data with Wikipedia, ensure your system matches our detailed specifications. We offer additional examples and resources [here](https://github.com/emrgnt-cmplxty/library_of_phi/tree/main).

### RAG Eval Harness

To measure the efficacy of your RAG pipeline, we provide a unique RAG evaluation harness.

#### Deploying the RAG Harness

1. **Initiate the Harness**:
   ```bash
   poetry run python sciphi/scripts/rag_harness.py --n-samples=100 --rag-enabled=True --evals_to_run="science_multiple_choice"
   ```

---

## Local Development

1. **Clone the Repository**:
   
   Begin by cloning the repository and stepping into the project directory:
   ```bash
   git clone https://github.com/emrgnt-cmplxty/sciphi.git
   cd sciphi
   ```

2. **Install the Dependencies**:

   Start by installing the primary requirements:
   ```bash
   pip install -r requirements.txt
   ```

   If you require further functionalities, consider the following:
   - For the developer's toolkit and utilities:
     ```bash
     pip install -r requirements-dev.txt
     ```

   - To encompass all optional dependencies:
     ```bash
     pip install -r requirements_all.txt
     ```

   Alternatively, to manage packages using Poetry:
   ```bash
   poetry install
   ```

     And for optional dependencies w/ Poetry:
     ```bash
     poetry install -E [all, all_with_extras]
     ```

3. **Setting Up Your Environment**:

   Begin by duplicating the sample environment file to craft your own:
   ```bash
   cp .env.example .env
   ```

   Next, use a text editor to adjust the `.env` file with your specific configurations. An example with `vim` is shown below:
   ```bash
   vim .env
   ```

   After entering your settings, ensure you save and exit the file.

---

## System Requirements

### Essential Packages:

- **Python Version**: `>=3.9,<3.12`

- **Required Libraries**:
  - `bs4`: `^0.0.1`
  - `fire`: `^0.5.0`
  - `openai`: `0.27.8`
  - `pandas`: `^2.1.0`
  - `python-dotenv`: `^1.0.0`
  - `pyyaml`: `^6.0.1`
  - `retrying`: `^1.3.4`
  - `sentencepiece`: `^0.1.99`
  - `torch`: `^2.1.0`
  - `tiktoken`: `^0.5.1`
  - `tqdm`: `^4.66.1`

### Supplementary Packages:

- **Anthropic Integration**:
  - `anthropic`: `^0.3.10`
- **Hugging Face Tools**:
  - `accelerate`: `^0.23.0`
  - `datasets`: `^2.14.5`
  - `transformers`: `^4.33.1`
- **Llama-Index**:
  - `llama-index`: `^0.8.29.post1`
- **Llama-CPP**:
  - `llama-cpp-python`: `^0.2.11`
- **VLLM Tools**:
  - `vllm`: `0.2.0`

---

## Licensing and Acknowledgment

This project is licensed under the [Apache-2.0 License](./LICENSE).

### Citing Our Work

If SciPhi plays a role in your research, we kindly ask you to acknowledge us with the following citation:

```plaintext
@software{SciPhi,
author = {Colegrove, Owen},
doi = {Pending},
month = {09},
title = {{SciPhi: A Framework for LLM Powered Data}},
url = {https://github.com/sciphi-ai/sciphi},
year = {2023}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "SciPhi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.09,<3.12",
    "maintainer_email": "",
    "keywords": "",
    "author": "Owen Colegrove",
    "author_email": "owen@emergentagi.com",
    "download_url": "https://files.pythonhosted.org/packages/67/3a/3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6/sciphi-0.1.0.tar.gz",
    "platform": null,
    "description": "# SciPhi [<span style=\"color:gold\">\u03a8\u03a6</span>]: A Framework for LLM Powered Data\n\n<p align=\"center\">\n<img width=\"716\" alt=\"SciPhi Logo\" src=\"https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982\">\n</p>\n\nSciPhi is a Python-based framework designed to facilitate the generation of high-quality synthetic data tailored for both Large Language Models (LLMs) and human users. This suite offers:\n\n- **Configurable Data Generation:** Craft datasets mediated by LLMs according to your specifications.\n- **Retriever-Augmented Generation (RAG) Integration:** Make use of an integrated RAG Provider API. Also, it comes bundled with an evaluation harness to ground your generated data to real-world datasets.\n- **Textbook Generation Module:** A module to power the generation of RAG-augmented synthetic textbooks straight from a given table of contents.\n\n\n---\n\n## Fast Setup\n\nInstall SciPhi via `pip`:\n\n### Base Installation:\n\n```bash\npip install sciphi\n```\n\n\n### Optional Dependencies:\n\nInstall with specific optional support using extras:\n\n- **Anthropic**: `'sciphi[anthropic_support]'`\n- **HF (includes Torch)**: `'sciphi[hf_support]'`\n- **Llama-CPP**: `'sciphi[llama_cpp_support]'`\n- **Llama-Index**: `'sciphi[llama_index_support]'`\n- **VLLM (includes Torch)**: `'sciphi[vllm_support]'`\n\n### Recommended (All Optional Dependencies):\n\n```bash\npip install 'sciphi[all_with_extras]'\n```\n\nNote: Depending on your shell, you might need to use quotes around the package name and extras to avoid globbing.\n\n---\n\n- Join our [Discord community](https://discord.gg/j9GxfbxqAe) for discussions and collaboration.\n- For specialized inquiries, [email us](mailto:owen@sciphi.ai).\n\n## Features\n\n### Community & Support\n\n- Engage with our vibrant community on [Discord](https://discord.gg/j9GxfbxqAe).\n- For tailored inquiries or feedback, please [email us](mailto:owen@sciphi.ai).\n\n### Textbook Generation (The Library of Phi)\n\nThis is an effort to democratize access to top-tier textbooks. By leveraging cutting-edge AI techniques, we aim to produce factual and high-quality educational materials. This can readily be extended to other domains, such as internal commercial documents.\n\n#### Generating Textbooks\n\n1. **Dry Run**:\n   ```bash\n   python sciphi/scripts/generate_textbook.py dry_run\n   ```\n\n2. **Default Textbook Generation**:\n   ```bash\n   python sciphi/scripts/generate_textbook.py run --textbook=Aerodynamics_of_Viscous_Fluids --rag-enabled=False --filter_existing_books=False --log-level=debug\n   ```\n   \n   You may use the setting rag-enabled to toggle on/off RAG augmentation of the textbook. You may customize the RAG provider through additional arguments.\n\n   See a [sample output here.](sciphi/data/library_of_phi/sample/Aerodynamics_of_Viscous_Fluids.md)\n\n3. **Crafting with a Custom Table of Contents**: \n\n   Prepare your table of contents and save it as `textbook_name.yaml`. Then, move this file to the recommended directory.\n\n4. **Activating RAG Functionality**: \n\n   Simply switch `rag-enabled` to `True`. Ensure you have the right `.env` variables set up, or provide CLI values for `rag_api_base` and `rag_api_key`.\n\n_Important:_ To make the most out of grounding your data with Wikipedia, ensure your system matches our detailed specifications. We offer additional examples and resources [here](https://github.com/emrgnt-cmplxty/library_of_phi/tree/main).\n\n### RAG Eval Harness\n\nTo measure the efficacy of your RAG pipeline, we provide a unique RAG evaluation harness.\n\n#### Deploying the RAG Harness\n\n1. **Initiate the Harness**:\n   ```bash\n   poetry run python sciphi/scripts/rag_harness.py --n-samples=100 --rag-enabled=True --evals_to_run=\"science_multiple_choice\"\n   ```\n\n---\n\n## Local Development\n\n1. **Clone the Repository**:\n   \n   Begin by cloning the repository and stepping into the project directory:\n   ```bash\n   git clone https://github.com/emrgnt-cmplxty/sciphi.git\n   cd sciphi\n   ```\n\n2. **Install the Dependencies**:\n\n   Start by installing the primary requirements:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n   If you require further functionalities, consider the following:\n   - For the developer's toolkit and utilities:\n     ```bash\n     pip install -r requirements-dev.txt\n     ```\n\n   - To encompass all optional dependencies:\n     ```bash\n     pip install -r requirements_all.txt\n     ```\n\n   Alternatively, to manage packages using Poetry:\n   ```bash\n   poetry install\n   ```\n\n     And for optional dependencies w/ Poetry:\n     ```bash\n     poetry install -E [all, all_with_extras]\n     ```\n\n3. **Setting Up Your Environment**:\n\n   Begin by duplicating the sample environment file to craft your own:\n   ```bash\n   cp .env.example .env\n   ```\n\n   Next, use a text editor to adjust the `.env` file with your specific configurations. An example with `vim` is shown below:\n   ```bash\n   vim .env\n   ```\n\n   After entering your settings, ensure you save and exit the file.\n\n---\n\n## System Requirements\n\n### Essential Packages:\n\n- **Python Version**: `>=3.9,<3.12`\n\n- **Required Libraries**:\n  - `bs4`: `^0.0.1`\n  - `fire`: `^0.5.0`\n  - `openai`: `0.27.8`\n  - `pandas`: `^2.1.0`\n  - `python-dotenv`: `^1.0.0`\n  - `pyyaml`: `^6.0.1`\n  - `retrying`: `^1.3.4`\n  - `sentencepiece`: `^0.1.99`\n  - `torch`: `^2.1.0`\n  - `tiktoken`: `^0.5.1`\n  - `tqdm`: `^4.66.1`\n\n### Supplementary Packages:\n\n- **Anthropic Integration**:\n  - `anthropic`: `^0.3.10`\n- **Hugging Face Tools**:\n  - `accelerate`: `^0.23.0`\n  - `datasets`: `^2.14.5`\n  - `transformers`: `^4.33.1`\n- **Llama-Index**:\n  - `llama-index`: `^0.8.29.post1`\n- **Llama-CPP**:\n  - `llama-cpp-python`: `^0.2.11`\n- **VLLM Tools**:\n  - `vllm`: `0.2.0`\n\n---\n\n## Licensing and Acknowledgment\n\nThis project is licensed under the [Apache-2.0 License](./LICENSE).\n\n### Citing Our Work\n\nIf SciPhi plays a role in your research, we kindly ask you to acknowledge us with the following citation:\n\n```plaintext\n@software{SciPhi,\nauthor = {Colegrove, Owen},\ndoi = {Pending},\nmonth = {09},\ntitle = {{SciPhi: A Framework for LLM Powered Data}},\nurl = {https://github.com/sciphi-ai/sciphi},\nyear = {2023}\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "SciPhi: A framework for synthetic data.",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc5e7b9e16ed1317dd79e668cd7fcdcd16b44f8efd446b00c57385359fe795de",
                "md5": "3f8bd65b93997970e0e35cd435e8ad6a",
                "sha256": "85e35dea4adb9100481b8a6f1103fa64a4accc92ba454e781e68f3b7683ceec8"
            },
            "downloads": -1,
            "filename": "sciphi-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3f8bd65b93997970e0e35cd435e8ad6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.09,<3.12",
            "size": 475853,
            "upload_time": "2023-10-23T15:44:34",
            "upload_time_iso_8601": "2023-10-23T15:44:34.159461Z",
            "url": "https://files.pythonhosted.org/packages/fc/5e/7b9e16ed1317dd79e668cd7fcdcd16b44f8efd446b00c57385359fe795de/sciphi-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "673a3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6",
                "md5": "ef18e6c9c027c219d22227c89657a0cb",
                "sha256": "1149e60994eeebef76717c6a8099b90becf1e8bd2bc96b6e9a65113c5dc86e80"
            },
            "downloads": -1,
            "filename": "sciphi-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ef18e6c9c027c219d22227c89657a0cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.09,<3.12",
            "size": 444828,
            "upload_time": "2023-10-23T15:44:36",
            "upload_time_iso_8601": "2023-10-23T15:44:36.120774Z",
            "url": "https://files.pythonhosted.org/packages/67/3a/3c599c5ede65f124dd54b8d8a88978487f733f7cb0e593e2dd7c39f7d8b6/sciphi-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-23 15:44:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "sciphi"
}
        
Elapsed time: 0.17693s