levseq

Name	levseq JSON
Version	1.4.3 JSON
	download
home_page	https://github.com/fhalab/levseq/
Summary	None
upload_time	2025-09-06 18:52:22
maintainer	None
docs_url	None
author	Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy
requires_python	>=3.8
license	GPL3
keywords	nanopore ont evseq
VCS
bugtrack_url
requirements	Bio biopandas biopython bokeh fsspec h5py holoviews hvplot jupyter_bokeh jupyterlab mappy matplotlib ninetysix numpy pandas panel pybedtools pycoQC pyfaidx pyparsing pysam scipy sciutil seaborn scikit-learn statsmodels tqdm logomaker biopandas
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Variant Sequencing with Nanopore (LevSeq)

LevSeq provides a streamlined pipeline for sequencing and analyzing genetic variants using Oxford Nanopore technology. In directed evolution experiments, LevSeq enables sequencing of every variant, enhancing data insight and creating datasets suitable for AI/ML methods. Sequence variants can be generated within a day at an extremely low cost.

![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.jpeg)
Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.
## Notes

LevSeq was designed for epPCR and SSM experiments, however, we are currently extending it to work for other enzyme engineering designs as well, the current features are under development:

1. Insertion handling (see version 4.1.3) - thanks to  Brian Zhong for his contributions to this section!
2. Gene calling (handling different genes, use the `--oligopool` flag)

If you notice any issues with new features or have adapted the LevSeq code for your own use cases, we would love community contributions! Please submit either an issue, or a pull request and we will aim to incorperate the changes.

## Quick Start

Note the current stable version is: `1.4.2`, the latest version is `1.4.3`. 

For stable releases these are made available via docker and pip. For latest versions, please clone the repo and install locally (see *Local development or install of latest version* below).

### Docker Installation (Recommended)

1. Install Docker: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/)
2. Pull the appropriate image:
   ```bash
   # For Linux/Windows x86 systems:
   docker pull yueminglong/levseq:levseq-1.4-x86
   
   # For Mac M-series chips (M1, M2, M3, M4):
   docker pull yueminglong/levseq:levseq-1.4-arm64
   ```
3. Run LevSeq:
   ```bash
   docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
   ```
4. Connect function data to your sequence data
   ```bash
   docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
   ```
### Pip Installation (Mac/Linux only)

**IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
```bash
brew install gcc@13 gcc@14
```

1. Create and activate conda environment:
   ```bash
   conda create --name levseq python=3.12 -y
   conda activate levseq
   ```

2. Install dependencies:
   ```bash
   conda install -c bioconda -c conda-forge samtools minimap2
   ```

3. Install LevSeq:
   ```bash
   pip install levseq
   ```

4. Run LevSeq:
   ```bash
   levseq my_experiment /path/to/data/ /path/to/ref.csv
   ```

5. Combine function data:
   ```bash
   levseq my_experiment /path/to/data/ /path/to/ref.csv  "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
   ```

Note for function data we currently expect a LCMS file e.g. with the columns: 
- `Sample Vial Number` (corresponding to the well that the sample was from). 
- `Area` (which becomes fitness value). 
- `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
- The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.


## Data and Visualization

- **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
- **Visualization Tool**: A web application is available at [https://levseqdb.streamlit.app/](https://levseqdb.streamlit.app/) - simply upload your LevSeq output and LCMS results
- **Self-hosted Solution**: You can deploy your own instance using our [LevSeq_db repository](https://github.com/fhalab/LevSeq_db)

## Reference File Format (ref.csv)

Your reference CSV file must contain the following columns:

| barcode_plate | name   | refseq    |
|---------------|--------|-----------|
| 33            | Q97A76 | ATGCGC... |

For oligopool experiments (multiple proteins per plate), use:

| barcode_plate | name   | refseq    |
|---------------|--------|-----------|
| 33            | Q97A76 | ATGCGCAAG |
| 33            | P96084 | ATGGATCA  |
| 34            | P46209 | ATGGGGCAA |
| 34            | Q60336 | ATGGGGCC  |

## Command Line Arguments

### Required Arguments
1. **name**: Name of the experiment (output folder)
2. **path**: Location of basecalled fastq files
3. **summary**: Path to reference CSV file

### Optional Arguments
- `--skip_demultiplexing`: Skip the demultiplexing step
- `--skip_variantcalling`: Skip the variant calling step
- `--output`: Custom save location (defaults to current directory)
- `--show_msa`: Show multiple sequence alignment for each well
- `--oligopool`: Process data as oligopool experiment

## Step-by-Step Tutorial

1. **Prepare your sequencing data**:
   - Your fastq files should be in a directory structure similar to Nanopore's output
   - Prepare a reference CSV file with barcode plates, sample names, and reference sequences

2. **Run LevSeq**:
   ```bash
   # Via Docker
   docker run --rm -v "/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
   
   # Via pip
   levseq my_experiment /path/to/data/ /path/to/ref.csv
   ```

3. **Analyze results**:
   - Output includes variant data (CSV) and interactive visualizations (HTML)
   - Upload results to the LevSeq visualization tool for further analysis

## Experimental Setup

For the wet lab protocol:
- Refer to the [wiki](https://github.com/fhalab/LevSeq/wiki/Experimental-protocols)
- See the methods section of [our paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625)
- Order forward and reverse primers compatible with your plasmid
- Install Oxford Nanopore's software for basecalling if needed

## Additional Resources

- **Example Notebook**: See `example/Example.ipynb` for a walkthrough
- **Advanced Usage**: See the [manuscript notebook](https://github.com/fhalab/LevSeq/blob/main/manuscript/notebooks/epPCR_10plates.ipynb)
- **Troubleshooting**: See our [computational protocols wiki](https://github.com/fhalab/LevSeq/wiki/Computational-protocols)

### Local development or install of latest version

```
conda create --name levseq python=3.10
git clone git@github.com:fhalab/LevSeq.git
cd LevSeq
python setup.py sdist bdist_wheel
pip install dist/levseq-1.4.3.tar.gz
```

## Citing LevSeq

If you find LevSeq useful, please cite our paper:

```bibtex
@article{long2024levseq,
  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},
  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and Gürsoy, Emre and Johnston, Kadina E and Arnold, Frances H},
  journal={ACS Synthetic Biology},
  year={2024},
  publisher={American Chemical Society}
}
```

## Contact

Leave a feature request in the issues or reach us via [email](mailto:levseqdb@gmail.com).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fhalab/levseq/",
    "name": "levseq",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Nanopore, ONT, evSeq",
    "author": "Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy",
    "author_email": "ylong@caltech.edu",
    "download_url": "https://files.pythonhosted.org/packages/e3/d9/ada7837657b789e834a5e79570e5b7fbab49a99d826bce9d613d27fbc19d/levseq-1.4.3.tar.gz",
    "platform": null,
    "description": "# Variant Sequencing with Nanopore (LevSeq)\n\nLevSeq provides a streamlined pipeline for sequencing and analyzing genetic variants using Oxford Nanopore technology. In directed evolution experiments, LevSeq enables sequencing of every variant, enhancing data insight and creating datasets suitable for AI/ML methods. Sequence variants can be generated within a day at an extremely low cost.\n\n![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.jpeg)\nFigure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.\n## Notes\n\nLevSeq was designed for epPCR and SSM experiments, however, we are currently extending it to work for other enzyme engineering designs as well, the current features are under development:\n\n1. Insertion handling (see version 4.1.3) - thanks to  Brian Zhong for his contributions to this section!\n2. Gene calling (handling different genes, use the `--oligopool` flag)\n\nIf you notice any issues with new features or have adapted the LevSeq code for your own use cases, we would love community contributions! Please submit either an issue, or a pull request and we will aim to incorperate the changes.\n\n## Quick Start\n\nNote the current stable version is: `1.4.2`, the latest version is `1.4.3`. \n\nFor stable releases these are made available via docker and pip. For latest versions, please clone the repo and install locally (see *Local development or install of latest version* below).\n\n### Docker Installation (Recommended)\n\n1. Install Docker: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/)\n2. Pull the appropriate image:\n   ```bash\n   # For Linux/Windows x86 systems:\n   docker pull yueminglong/levseq:levseq-1.4-x86\n   \n   # For Mac M-series chips (M1, M2, M3, M4):\n   docker pull yueminglong/levseq:levseq-1.4-arm64\n   ```\n3. Run LevSeq:\n   ```bash\n   docker run --rm -v \"/full/path/to/data:/levseq_results\" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv\n   ```\n4. Connect function data to your sequence data\n   ```bash\n   docker run --rm -v \"/full/path/to/data:/levseq_results\" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files \"levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv\" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df \"levseq_results/visualization_partial.csv\"\n   ```\n### Pip Installation (Mac/Linux only)\n\n**IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:\n```bash\nbrew install gcc@13 gcc@14\n```\n\n1. Create and activate conda environment:\n   ```bash\n   conda create --name levseq python=3.12 -y\n   conda activate levseq\n   ```\n\n2. Install dependencies:\n   ```bash\n   conda install -c bioconda -c conda-forge samtools minimap2\n   ```\n\n3. Install LevSeq:\n   ```bash\n   pip install levseq\n   ```\n\n4. Run LevSeq:\n   ```bash\n   levseq my_experiment /path/to/data/ /path/to/ref.csv\n   ```\n\n5. Combine function data:\n   ```bash\n   levseq my_experiment /path/to/data/ /path/to/ref.csv  \"LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv,\" --smiles 'reaction_smiles_string' --compound \"name_of_compound_in_LCMS_file\" --variant_df \"visualization_partial.csv\"\n   ```\n\nNote for function data we currently expect a LCMS file e.g. with the columns: \n- `Sample Vial Number` (corresponding to the well that the sample was from). \n- `Area` (which becomes fitness value). \n- `Compound Name` which is the name of the compound we filter for that is passed as a parameter.\n- The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.\n\n\n## Data and Visualization\n\n- **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)\n- **Visualization Tool**: A web application is available at [https://levseqdb.streamlit.app/](https://levseqdb.streamlit.app/) - simply upload your LevSeq output and LCMS results\n- **Self-hosted Solution**: You can deploy your own instance using our [LevSeq_db repository](https://github.com/fhalab/LevSeq_db)\n\n## Reference File Format (ref.csv)\n\nYour reference CSV file must contain the following columns:\n\n| barcode_plate | name   | refseq    |\n|---------------|--------|-----------|\n| 33            | Q97A76 | ATGCGC... |\n\nFor oligopool experiments (multiple proteins per plate), use:\n\n| barcode_plate | name   | refseq    |\n|---------------|--------|-----------|\n| 33            | Q97A76 | ATGCGCAAG |\n| 33            | P96084 | ATGGATCA  |\n| 34            | P46209 | ATGGGGCAA |\n| 34            | Q60336 | ATGGGGCC  |\n\n## Command Line Arguments\n\n### Required Arguments\n1. **name**: Name of the experiment (output folder)\n2. **path**: Location of basecalled fastq files\n3. **summary**: Path to reference CSV file\n\n### Optional Arguments\n- `--skip_demultiplexing`: Skip the demultiplexing step\n- `--skip_variantcalling`: Skip the variant calling step\n- `--output`: Custom save location (defaults to current directory)\n- `--show_msa`: Show multiple sequence alignment for each well\n- `--oligopool`: Process data as oligopool experiment\n\n## Step-by-Step Tutorial\n\n1. **Prepare your sequencing data**:\n   - Your fastq files should be in a directory structure similar to Nanopore's output\n   - Prepare a reference CSV file with barcode plates, sample names, and reference sequences\n\n2. **Run LevSeq**:\n   ```bash\n   # Via Docker\n   docker run --rm -v \"/path/to/data:/levseq_results\" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv\n   \n   # Via pip\n   levseq my_experiment /path/to/data/ /path/to/ref.csv\n   ```\n\n3. **Analyze results**:\n   - Output includes variant data (CSV) and interactive visualizations (HTML)\n   - Upload results to the LevSeq visualization tool for further analysis\n\n## Experimental Setup\n\nFor the wet lab protocol:\n- Refer to the [wiki](https://github.com/fhalab/LevSeq/wiki/Experimental-protocols)\n- See the methods section of [our paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625)\n- Order forward and reverse primers compatible with your plasmid\n- Install Oxford Nanopore's software for basecalling if needed\n\n## Additional Resources\n\n- **Example Notebook**: See `example/Example.ipynb` for a walkthrough\n- **Advanced Usage**: See the [manuscript notebook](https://github.com/fhalab/LevSeq/blob/main/manuscript/notebooks/epPCR_10plates.ipynb)\n- **Troubleshooting**: See our [computational protocols wiki](https://github.com/fhalab/LevSeq/wiki/Computational-protocols)\n\n### Local development or install of latest version\n\n```\nconda create --name levseq python=3.10\ngit clone git@github.com:fhalab/LevSeq.git\ncd LevSeq\npython setup.py sdist bdist_wheel\npip install dist/levseq-1.4.3.tar.gz\n```\n\n## Citing LevSeq\n\nIf you find LevSeq useful, please cite our paper:\n\n```bibtex\n@article{long2024levseq,\n  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},\n  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and G\u00fcrsoy, Emre and Johnston, Kadina E and Arnold, Frances H},\n  journal={ACS Synthetic Biology},\n  year={2024},\n  publisher={American Chemical Society}\n}\n```\n\n## Contact\n\nLeave a feature request in the issues or reach us via [email](mailto:levseqdb@gmail.com).\n",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": null,
    "version": "1.4.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/fhalab/levseq/",
        "Documentation": "https://github.com/fhalab/levseq/",
        "Homepage": "https://github.com/fhalab/levseq/",
        "Source Code": "https://github.com/fhalab/levseq/"
    },
    "split_keywords": [
        "nanopore",
        " ont",
        " evseq"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b43d76198e7939ae05da37b17459aff1bd7ca53cb7a1a8e11ad3697581cbf825",
                "md5": "1a1ebacbeded6b396432b9ec9a380eff",
                "sha256": "996baebdd4ffa164687207bbb99062fea9983f95afd237385db5a2242d7c9bca"
            },
            "downloads": -1,
            "filename": "levseq-1.4.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1a1ebacbeded6b396432b9ec9a380eff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 379535,
            "upload_time": "2025-09-06T18:52:20",
            "upload_time_iso_8601": "2025-09-06T18:52:20.111261Z",
            "url": "https://files.pythonhosted.org/packages/b4/3d/76198e7939ae05da37b17459aff1bd7ca53cb7a1a8e11ad3697581cbf825/levseq-1.4.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e3d9ada7837657b789e834a5e79570e5b7fbab49a99d826bce9d613d27fbc19d",
                "md5": "65a690cba3dc54f8ac8d2995745050ba",
                "sha256": "3841a633d1a152b3b581d3ca1036b4674db17ec1d4ca87ecc79a1e93de04c2fb"
            },
            "downloads": -1,
            "filename": "levseq-1.4.3.tar.gz",
            "has_sig": false,
            "md5_digest": "65a690cba3dc54f8ac8d2995745050ba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 369098,
            "upload_time": "2025-09-06T18:52:22",
            "upload_time_iso_8601": "2025-09-06T18:52:22.170345Z",
            "url": "https://files.pythonhosted.org/packages/e3/d9/ada7837657b789e834a5e79570e5b7fbab49a99d826bce9d613d27fbc19d/levseq-1.4.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-06 18:52:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fhalab",
    "github_project": "levseq",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "Bio",
            "specs": []
        },
        {
            "name": "biopandas",
            "specs": []
        },
        {
            "name": "biopython",
            "specs": []
        },
        {
            "name": "bokeh",
            "specs": []
        },
        {
            "name": "fsspec",
            "specs": []
        },
        {
            "name": "h5py",
            "specs": []
        },
        {
            "name": "holoviews",
            "specs": []
        },
        {
            "name": "hvplot",
            "specs": []
        },
        {
            "name": "jupyter_bokeh",
            "specs": []
        },
        {
            "name": "jupyterlab",
            "specs": []
        },
        {
            "name": "mappy",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "ninetysix",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "panel",
            "specs": [
                [
                    "==",
                    "1.2.3"
                ]
            ]
        },
        {
            "name": "pybedtools",
            "specs": []
        },
        {
            "name": "pycoQC",
            "specs": []
        },
        {
            "name": "pyfaidx",
            "specs": []
        },
        {
            "name": "pyparsing",
            "specs": []
        },
        {
            "name": "pysam",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "sciutil",
            "specs": []
        },
        {
            "name": "seaborn",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "statsmodels",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "logomaker",
            "specs": []
        },
        {
            "name": "biopandas",
            "specs": []
        }
    ],
    "lcname": "levseq"
}

Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy