cytetype


Namecytetype JSON
Version 0.8.3 PyPI version JSON
download
home_pageNone
SummaryPython client for characterization of clusters from single-cell RNA-seq data.
upload_time2025-07-30 14:24:39
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseCC BY-NC-SA 4.0
keywords bioinformatics single-cell rna-seq annotation cell types
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="left">CyteType</h1>

<p align="left">
  <!-- GitHub Actions CI Badge -->
  <a href="https://github.com/NygenAnalytics/cytetype/actions/workflows/publish.yml">
    <img src="https://github.com/NygenAnalytics/cytetype/actions/workflows/publish.yml/badge.svg" alt="CI Status">
  </a>
  <a href="https://github.com/NygenAnalytics/cytetype/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg" alt="License: CC BY-NC-SA 4.0">
  </a>
  <a href="https://pypi.org/project/cytetype/">
    <img src="https://img.shields.io/pypi/v/cytetype.svg" alt="PyPI version">
  </a>
  <img src="https://img.shields.io/badge/python-≥3.11-blue.svg" alt="Python Version">
  <a href="https://colab.research.google.com/drive/1aRLsI3mx8JR8u5BKHs48YUbLsqRsh2N7?usp=sharing">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
  </a>
</p>

---

> ⚠️ CyteType is under active development and breaking changes may be introduced. Please work with the latest version to ensure compatibility and access to new features.

**CyteType** is a Python package for deep characterization of cell clusters from single-cell RNA-seq data. This package interfaces with Anndata objects to call CyteType API.

## Table of Contents

- [Example Report](#example-report)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Usage](#usage)
  - [Required Preprocessing](#required-preprocessing)
  - [Annotation](#annotation)
- [Configuration Options](#configuration-options)
  - [Initialization Parameters](#initialization-parameters)
  - [Submitting Annotation job](#submitting-annotation-job)
  - [Custom LLM Configuration](#custom-llm-configuration)
  - [Custom LLM Configuration (Ollama)](#custom-llm-configuration-ollama)
  - [Advanced parameters](#advanced-parameters)
- [Annotation Process](#annotation-process)
  - [Core Functionality](#core-functionality)
  - [Advanced Context Generation](#advanced-context-generation)
  - [Result Format](#result-format)
  - [Advanced Result Components](#advanced-result-components)
- [Development](#development)
  - [Setup](#setup)
  - [Exception Handling](#exception-handling)
  - [Testing](#testing)
- [License](#license)

## Example Report

View a sample annotation report: <a href="https://nygen-labs-prod--cytetype-api.modal.run/report/77069508-d9f1-4a79-bdab-5870fc3ccdf3?v=250722" target="blank">CyteType Report</a>



## Quick Start

```python
import anndata
import scanpy as sc
import cytetype

# Load and preprocess your data
adata = anndata.read_h5ad("path/to/your/data.h5ad")
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added = "clusters") 
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')

# Initialize CyteType (performs data preparation)
annotator = cytetype.CyteType(adata, group_key='clusters')

# Run annotation
adata = annotator.run(
    study_context="Human brain tissue from Alzheimer's disease patients"
)

# View results
print(adata.obs.cytetype_annotation_clusters)
print(adata.obs.cytetype_cellOntologyTerm_clusters)
```

## Installation

```bash
pip install cytetype
```

## Usage

### Required Preprocessing

Your `AnnData` object must have:

- Log-normalized expression data in `adata.X`
- Cluster labels in `adata.obs` 
- Differential expression results from `sc.tl.rank_genes_groups`

```python
import scanpy as sc

# Standard preprocessing
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Clustering
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, key_added='clusters')

# Differential expression (required)
sc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')
```

### Annotation

```python
from cytetype import CyteType

# Initialize (data preparation happens here)
annotator = CyteType(adata, group_key='clusters')

# Run annotation
adata = annotator.run(
    study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions."
)

# Or with custom metadata for tracking
adata = annotator.run(
    study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.",
    metadata={
        'experiment_name': 'Brain_AD_Study',
        'run_label': 'initial_analysis'
    }
)

# Results are stored in:
# - adata.obs.cytetype_annotation_clusters (cell type annotations)
# - adata.obs.cytetype_cellOntologyTerm_clusters (cell ontology terms)
# - adata.uns['cytetype_results'] (full API response)
```

# Configuration Options

## Initialization Parameters

```python
annotator = CyteType(
    adata,
    group_key='leiden',                    # Required: cluster column name
    rank_key='rank_genes_groups',          # DE results key
    gene_symbols_column='gene_symbols',    # Gene symbols column
    n_top_genes=50,                        # Top marker genes per cluster
    aggregate_metadata=True,               # Aggregate metadata
    min_percentage=10,                     # Min percentage for cluster context
    pcent_batch_size=2000,                 # Batch size for calculations
    coordinates_key='X_umap',              # Coordinates key for visualization
    max_cells_per_group=1000,              # Max cells per group for visualization 
)
```

## Submitting Annotation job

The `run` method accepts several configuration parameters to control the annotation process:

```python
annotator.run(
    study_context="Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.",
    metadata={
        'experiment_name': 'Brain_AD_Study',
        'run_label': 'initial_analysis'
    },
    save_query=True,
    query_filename="query.json",
    show_progress=True,
)
```

### Custom LLM Configuration

The CyteType API provides access to some chosen LLM providers by default.
Users can choose to provide their own LLM models and model providers.
Many models can be provided simultaneously, and then they will be used iteratively for each of the clusters.

```python
adata = annotator.run(
    study_context="Human PBMC from COVID-19 patients",
    llm_configs=[{
        'provider': 'openai',
        'name': 'gpt-4o-mini',
        'apiKey': 'your-api-key',
        'baseUrl': 'https://api.openai.com/v1',  # Optional
        'modelSettings': {                       # Optional
            'temperature': 0.0,
            'max_tokens': 4096
        }  
    }],
)
```

#### Rate Limits

If you do not provide your own model providers, then the CyteType API implements rate limiting for fair usage:

- Annotation submissions: 5 RPD
- Reannotation: 10 RPD
- Report retrieval: 20 RPM

If you exceed rate limits, the system will return appropriate error messages with retry timing information

Supported providers: `openai`, `anthropic`, `google`, `xai`, `groq`, `mistral`, `openrouter`, `bedrock`

### Custom LLM Configuration (Ollama)

The CyteType API supports Ollama models as well. You will need to expose your Ollama server to the internet using a tunneling service. Refer to the [OLLAMA.md](./OLLAMA.md) file for instructions on how to do this.

### Advanced parameters

```python
adata = annotator.run(
    ...
    # API polling and timeout settings
    poll_interval_seconds=30,           # How often to check for results (default)
    timeout_seconds=7200,               # Max wait time (default: 2 hours)
    
    # API configuration
    api_url="https://custom-api.com",   # Custom API endpoint
    auth_token="your-auth-token",       # Authentication token
)
```

### Authentication and Authorization

You can provide your own token to the `run` method using the `auth_token` parameter. This will be included in the Authorization header as "Bearer {auth_token}". All API requests will be authenticated with this token.

## Annotation Process

CyteType performs comprehensive cell type annotation through an automated pipeline:

### Core Functionality

- **Automated Annotation**: Identifies likely cell types for each cluster based on marker genes
- **Ontology Mapping**: Maps identified cell types to Cell Ontology terms (e.g., `CL_0000127`)  
- **Review & Justification**: Analyzes supporting/conflicting markers and assesses confidence
- **Literature Search**: Searches for relevant literature to support the annotation

### Advanced Context Generation

CyteType generates detailed contextual information to inform annotations:

**Dataset-Level Context**: Comprehensive analysis of experimental metadata:
```
"This dataset originates from multiple human tissues including adrenal gland, 
brain, liver, lung, lymph node, and pleural effusion, with samples derived 
from both healthy individuals and patients diagnosed with lung adenocarcinoma 
or small cell lung carcinoma. Experimental data was generated via 10X Genomics 
Chromium 3' single-cell sequencing, which may introduce platform-specific 
technical artifacts."
```

**Cluster-Specific Context**: Detailed metadata analysis for each cluster:
```
"Cluster 1 comprises 99% lung-derived cells, with 65% originating from lung 
adenocarcinoma samples and 33% from normal tissue. The cells are distributed 
across two primary donors with demographic characteristics including 67% 
female donors and 97% self-reported European ethnicity. Treatment conditions 
include Platinum Doublet (55%) and Naive (44%)."
```

This contextual information enables more accurate annotations by considering:
- **Tissue Origins**: Multi-tissue datasets with precise anatomical mapping
- **Disease States**: Healthy vs. pathological conditions with treatment history
- **Technical Factors**: Sequencing platforms, batch effects, and processing methods
- **Demographics**: Age, sex, and ethnicity distributions
- **Treatment Context**: Therapeutic interventions and their potential cellular effects

### Result Format

Results include comprehensive annotations for each cluster with expert-level analysis:

```python
# Access results after annotation using the helper method
results = annotator.get_results()

# Or access directly from the stored JSON string
import json
results = json.loads(adata.uns['cytetype_results']['result'])

# Each annotation includes comprehensive information:
for annotation in results['annotations']:
    print(f"Cluster: {annotation['clusterId']}")
    print(f"Cell Type: {annotation['annotation']}")
    print(f"Granular Annotation: {annotation['granularAnnotation']}")
    print(f"Cell State: {annotation['cellState']}")
    print(f"Confidence: {annotation['confidence']}")
    print(f"Ontology Term: {annotation['ontologyTerm']}")
    print(f"Is Heterogeneous: {annotation['isHeterogeneous']}")
    
    # Supporting evidence and conflicts
    print(f"Supporting Markers: {annotation['supportingMarkers']}")
    print(f"Conflicting Markers: {annotation['conflictingMarkers']}")
    print(f"Missing Expression: {annotation['missingExpression']}")
    print(f"Unexpected Expression: {annotation['unexpectedExpression']}")
    
    # Expert review and justification
    print(f"Justification: {annotation['justification']}")
    print(f"Review Comments: {annotation['reviewComments']}")
    print(f"Feedback: {annotation['feedback']}")
    
    # Similarity and literature support
    print(f"Similar Clusters: {annotation['similarity']}")
    print(f"Corroborating Papers: {len(annotation['corroboratingPapers']['papers'])} papers")
    
    # Model usage and performance metrics
    print(f"Models Used: {annotation['llmModels']}")
    print(f"Total Processing Time: {annotation['usageInfo']['total_runtime_seconds']:.1f}s")
    print(f"Total Tokens: {annotation['usageInfo']['total_tokens']}")
```

#### Advanced Result Components

**Expert Review System**: Each annotation undergoes multi-stage review with detailed feedback:
- **Review Comments**: Expert-level biological interpretation and mechanistic insights
- **Confidence Assessment**: Moderate/High confidence based on marker evidence
- **Feedback Loop**: Iterative refinement based on biological plausibility
- **Mechanistic Analysis**: Discussion of signaling pathways, developmental biology, and disease pathogenesis

**Literature Integration**: Automatic literature search provides supporting evidence:
- **Corroborating Papers**: Relevant publications with PMIDs and summaries
- **Biological Context**: Integration of current research to validate annotations

Example corroborating papers:
```python
papers = annotation['corroboratingPapers']['papers']
for paper in papers:
    print(f"Title: {paper['title']}")
    print(f"PMID: {paper['pmid']}")
    print(f"Journal: {paper['journal']} ({paper['year']})")
    print(f"Summary: {paper['summary']}")
```

Sample output:
```
Title: YAP regulates alveolar epithelial cell differentiation and AGER via NFIB/KLF5/NKX2-1
PMID: 34466790
Journal: iScience (2021)
Summary: Documents atypical HOPX+AGER+SFTPC+ 'dual-positive' alveolar cells that 
persist in mature lungs, directly validating the mixed AT1/AT2 phenotype observed 
in malignant clusters.
```

**Marker Analysis**: Comprehensive evaluation of gene expression patterns:
- **Supporting Markers**: Genes that strongly support the annotation
- **Conflicting Markers**: Genes that challenge the annotation with explanations
- **Missing/Unexpected Expression**: Detailed analysis of expression anomalies with biological explanations

Example unexpected expression analysis:
```
"Expression of AT2 markers (SFTPC, SFTPB) and club cell marker (SCGB1A1) 
in a cluster with strong AT1 markers" 
→ Explained by: "dedifferentiation process in cancer where transformed 
epithelial cells exhibit aberrant co-expression of markers from multiple 
lineages due to pathological plasticity"
```

**Performance Metrics**: Detailed usage statistics for transparency:
- **Model Information**: Which LLM models were used for each analysis step
- **Runtime Statistics**: Processing time and token usage per cluster
- **Annotation Attempts**: Number of refinement iterations

#### Example Annotations

CyteType provides sophisticated, multi-layered annotations:

**Basic Cell Type**: `"B-cell"`, `"Lung Adenocarcinoma Cell"`

**Granular Annotations**: Detailed phenotypic descriptions:
- `"AGER-positive, HOPX-positive, KRT19-positive lung adenocarcinoma cell with mixed AT1/AT2 phenotype"`
- `"EMT-transitioned, pleural metastasis-competent adenocarcinoma cell with platinum-induced stress phenotype"`
- `"CD74-high activated tumor-infiltrating B-cell in lung adenocarcinoma microenvironment"`

**Cell States**: Functional and pathological states:
- `"Transformed"`, `"Malignant"`, `"Activated"`, `"EMT-transitioned and stressed"`

**Expert Review Comments**: Detailed mechanistic insights:
```
"The mixed AT1/AT2 phenotype observed in this malignant cluster exemplifies 
the pathological dedifferentiation characteristic of lung adenocarcinoma, 
but the degree of lineage promiscuity suggests exceptional cellular plasticity 
beyond typical adenocarcinoma patterns. This may indicate activation of 
primitive developmental pathways like Wnt/β-catenin signaling..."
```

## Development

### Setup

```bash
git clone https://github.com/NygenAnalytics/cytetype.git
cd cytetype
uv sync --all-extras
uv run pip install -e .
```

### Exception Handling

The package defines several custom exceptions for different error scenarios:

- **`CyteTypeError`**: Base exception class for all CyteType-related errors
- **`CyteTypeAPIError`**: Raised for errors during API communication (network issues, invalid responses)
- **`CyteTypeTimeoutError`**: Raised when API requests timeout
- **`CyteTypeJobError`**: Raised when the API reports an error for a specific job

### Testing

```bash
uv run pytest              # Run tests
uv run ruff check .        # Linting
uv run ruff format .       # Formatting
uv run mypy .              # Type checking
```

## License

Licensed under CC BY-NC-SA 4.0 - see [LICENSE](LICENSE) for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cytetype",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "bioinformatics, single-cell, RNA-seq, annotation, cell types",
    "author": null,
    "author_email": "Parashar Dhapola <parashar@nygen.io>",
    "download_url": "https://files.pythonhosted.org/packages/98/fd/f3e8b1b71d38b0eb50f38ee01854275035c1ab1690abc577dbb91fdcfffe/cytetype-0.8.3.tar.gz",
    "platform": null,
    "description": "<h1 align=\"left\">CyteType</h1>\n\n<p align=\"left\">\n  <!-- GitHub Actions CI Badge -->\n  <a href=\"https://github.com/NygenAnalytics/cytetype/actions/workflows/publish.yml\">\n    <img src=\"https://github.com/NygenAnalytics/cytetype/actions/workflows/publish.yml/badge.svg\" alt=\"CI Status\">\n  </a>\n  <a href=\"https://github.com/NygenAnalytics/cytetype/blob/main/LICENSE\">\n    <img src=\"https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg\" alt=\"License: CC BY-NC-SA 4.0\">\n  </a>\n  <a href=\"https://pypi.org/project/cytetype/\">\n    <img src=\"https://img.shields.io/pypi/v/cytetype.svg\" alt=\"PyPI version\">\n  </a>\n  <img src=\"https://img.shields.io/badge/python-\u22653.11-blue.svg\" alt=\"Python Version\">\n  <a href=\"https://colab.research.google.com/drive/1aRLsI3mx8JR8u5BKHs48YUbLsqRsh2N7?usp=sharing\">\n    <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\">\n  </a>\n</p>\n\n---\n\n> \u26a0\ufe0f CyteType is under active development and breaking changes may be introduced. Please work with the latest version to ensure compatibility and access to new features.\n\n**CyteType** is a Python package for deep characterization of cell clusters from single-cell RNA-seq data. This package interfaces with Anndata objects to call CyteType API.\n\n## Table of Contents\n\n- [Example Report](#example-report)\n- [Quick Start](#quick-start)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Required Preprocessing](#required-preprocessing)\n  - [Annotation](#annotation)\n- [Configuration Options](#configuration-options)\n  - [Initialization Parameters](#initialization-parameters)\n  - [Submitting Annotation job](#submitting-annotation-job)\n  - [Custom LLM Configuration](#custom-llm-configuration)\n  - [Custom LLM Configuration (Ollama)](#custom-llm-configuration-ollama)\n  - [Advanced parameters](#advanced-parameters)\n- [Annotation Process](#annotation-process)\n  - [Core Functionality](#core-functionality)\n  - [Advanced Context Generation](#advanced-context-generation)\n  - [Result Format](#result-format)\n  - [Advanced Result Components](#advanced-result-components)\n- [Development](#development)\n  - [Setup](#setup)\n  - [Exception Handling](#exception-handling)\n  - [Testing](#testing)\n- [License](#license)\n\n## Example Report\n\nView a sample annotation report: <a href=\"https://nygen-labs-prod--cytetype-api.modal.run/report/77069508-d9f1-4a79-bdab-5870fc3ccdf3?v=250722\" target=\"blank\">CyteType Report</a>\n\n\n\n## Quick Start\n\n```python\nimport anndata\nimport scanpy as sc\nimport cytetype\n\n# Load and preprocess your data\nadata = anndata.read_h5ad(\"path/to/your/data.h5ad\")\nsc.pp.normalize_total(adata, target_sum=1e4)\nsc.pp.log1p(adata)\nsc.pp.pca(adata)\nsc.pp.neighbors(adata)\nsc.tl.leiden(adata, key_added = \"clusters\") \nsc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')\n\n# Initialize CyteType (performs data preparation)\nannotator = cytetype.CyteType(adata, group_key='clusters')\n\n# Run annotation\nadata = annotator.run(\n    study_context=\"Human brain tissue from Alzheimer's disease patients\"\n)\n\n# View results\nprint(adata.obs.cytetype_annotation_clusters)\nprint(adata.obs.cytetype_cellOntologyTerm_clusters)\n```\n\n## Installation\n\n```bash\npip install cytetype\n```\n\n## Usage\n\n### Required Preprocessing\n\nYour `AnnData` object must have:\n\n- Log-normalized expression data in `adata.X`\n- Cluster labels in `adata.obs` \n- Differential expression results from `sc.tl.rank_genes_groups`\n\n```python\nimport scanpy as sc\n\n# Standard preprocessing\nsc.pp.normalize_total(adata, target_sum=1e4)\nsc.pp.log1p(adata)\n\n# Clustering\nsc.pp.pca(adata)\nsc.pp.neighbors(adata)\nsc.tl.leiden(adata, key_added='clusters')\n\n# Differential expression (required)\nsc.tl.rank_genes_groups(adata, groupby='clusters', method='t-test')\n```\n\n### Annotation\n\n```python\nfrom cytetype import CyteType\n\n# Initialize (data preparation happens here)\nannotator = CyteType(adata, group_key='clusters')\n\n# Run annotation\nadata = annotator.run(\n    study_context=\"Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.\"\n)\n\n# Or with custom metadata for tracking\nadata = annotator.run(\n    study_context=\"Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.\",\n    metadata={\n        'experiment_name': 'Brain_AD_Study',\n        'run_label': 'initial_analysis'\n    }\n)\n\n# Results are stored in:\n# - adata.obs.cytetype_annotation_clusters (cell type annotations)\n# - adata.obs.cytetype_cellOntologyTerm_clusters (cell ontology terms)\n# - adata.uns['cytetype_results'] (full API response)\n```\n\n# Configuration Options\n\n## Initialization Parameters\n\n```python\nannotator = CyteType(\n    adata,\n    group_key='leiden',                    # Required: cluster column name\n    rank_key='rank_genes_groups',          # DE results key\n    gene_symbols_column='gene_symbols',    # Gene symbols column\n    n_top_genes=50,                        # Top marker genes per cluster\n    aggregate_metadata=True,               # Aggregate metadata\n    min_percentage=10,                     # Min percentage for cluster context\n    pcent_batch_size=2000,                 # Batch size for calculations\n    coordinates_key='X_umap',              # Coordinates key for visualization\n    max_cells_per_group=1000,              # Max cells per group for visualization \n)\n```\n\n## Submitting Annotation job\n\nThe `run` method accepts several configuration parameters to control the annotation process:\n\n```python\nannotator.run(\n    study_context=\"Adult human brain tissue samples from healthy controls and Alzheimer's disease patients, analyzed using 10X Genomics single-cell RNA-seq. Samples include cortical and hippocampal regions.\",\n    metadata={\n        'experiment_name': 'Brain_AD_Study',\n        'run_label': 'initial_analysis'\n    },\n    save_query=True,\n    query_filename=\"query.json\",\n    show_progress=True,\n)\n```\n\n### Custom LLM Configuration\n\nThe CyteType API provides access to some chosen LLM providers by default.\nUsers can choose to provide their own LLM models and model providers.\nMany models can be provided simultaneously, and then they will be used iteratively for each of the clusters.\n\n```python\nadata = annotator.run(\n    study_context=\"Human PBMC from COVID-19 patients\",\n    llm_configs=[{\n        'provider': 'openai',\n        'name': 'gpt-4o-mini',\n        'apiKey': 'your-api-key',\n        'baseUrl': 'https://api.openai.com/v1',  # Optional\n        'modelSettings': {                       # Optional\n            'temperature': 0.0,\n            'max_tokens': 4096\n        }  \n    }],\n)\n```\n\n#### Rate Limits\n\nIf you do not provide your own model providers, then the CyteType API implements rate limiting for fair usage:\n\n- Annotation submissions: 5 RPD\n- Reannotation: 10 RPD\n- Report retrieval: 20 RPM\n\nIf you exceed rate limits, the system will return appropriate error messages with retry timing information\n\nSupported providers: `openai`, `anthropic`, `google`, `xai`, `groq`, `mistral`, `openrouter`, `bedrock`\n\n### Custom LLM Configuration (Ollama)\n\nThe CyteType API supports Ollama models as well. You will need to expose your Ollama server to the internet using a tunneling service. Refer to the [OLLAMA.md](./OLLAMA.md) file for instructions on how to do this.\n\n### Advanced parameters\n\n```python\nadata = annotator.run(\n    ...\n    # API polling and timeout settings\n    poll_interval_seconds=30,           # How often to check for results (default)\n    timeout_seconds=7200,               # Max wait time (default: 2 hours)\n    \n    # API configuration\n    api_url=\"https://custom-api.com\",   # Custom API endpoint\n    auth_token=\"your-auth-token\",       # Authentication token\n)\n```\n\n### Authentication and Authorization\n\nYou can provide your own token to the `run` method using the `auth_token` parameter. This will be included in the Authorization header as \"Bearer {auth_token}\". All API requests will be authenticated with this token.\n\n## Annotation Process\n\nCyteType performs comprehensive cell type annotation through an automated pipeline:\n\n### Core Functionality\n\n- **Automated Annotation**: Identifies likely cell types for each cluster based on marker genes\n- **Ontology Mapping**: Maps identified cell types to Cell Ontology terms (e.g., `CL_0000127`)  \n- **Review & Justification**: Analyzes supporting/conflicting markers and assesses confidence\n- **Literature Search**: Searches for relevant literature to support the annotation\n\n### Advanced Context Generation\n\nCyteType generates detailed contextual information to inform annotations:\n\n**Dataset-Level Context**: Comprehensive analysis of experimental metadata:\n```\n\"This dataset originates from multiple human tissues including adrenal gland, \nbrain, liver, lung, lymph node, and pleural effusion, with samples derived \nfrom both healthy individuals and patients diagnosed with lung adenocarcinoma \nor small cell lung carcinoma. Experimental data was generated via 10X Genomics \nChromium 3' single-cell sequencing, which may introduce platform-specific \ntechnical artifacts.\"\n```\n\n**Cluster-Specific Context**: Detailed metadata analysis for each cluster:\n```\n\"Cluster 1 comprises 99% lung-derived cells, with 65% originating from lung \nadenocarcinoma samples and 33% from normal tissue. The cells are distributed \nacross two primary donors with demographic characteristics including 67% \nfemale donors and 97% self-reported European ethnicity. Treatment conditions \ninclude Platinum Doublet (55%) and Naive (44%).\"\n```\n\nThis contextual information enables more accurate annotations by considering:\n- **Tissue Origins**: Multi-tissue datasets with precise anatomical mapping\n- **Disease States**: Healthy vs. pathological conditions with treatment history\n- **Technical Factors**: Sequencing platforms, batch effects, and processing methods\n- **Demographics**: Age, sex, and ethnicity distributions\n- **Treatment Context**: Therapeutic interventions and their potential cellular effects\n\n### Result Format\n\nResults include comprehensive annotations for each cluster with expert-level analysis:\n\n```python\n# Access results after annotation using the helper method\nresults = annotator.get_results()\n\n# Or access directly from the stored JSON string\nimport json\nresults = json.loads(adata.uns['cytetype_results']['result'])\n\n# Each annotation includes comprehensive information:\nfor annotation in results['annotations']:\n    print(f\"Cluster: {annotation['clusterId']}\")\n    print(f\"Cell Type: {annotation['annotation']}\")\n    print(f\"Granular Annotation: {annotation['granularAnnotation']}\")\n    print(f\"Cell State: {annotation['cellState']}\")\n    print(f\"Confidence: {annotation['confidence']}\")\n    print(f\"Ontology Term: {annotation['ontologyTerm']}\")\n    print(f\"Is Heterogeneous: {annotation['isHeterogeneous']}\")\n    \n    # Supporting evidence and conflicts\n    print(f\"Supporting Markers: {annotation['supportingMarkers']}\")\n    print(f\"Conflicting Markers: {annotation['conflictingMarkers']}\")\n    print(f\"Missing Expression: {annotation['missingExpression']}\")\n    print(f\"Unexpected Expression: {annotation['unexpectedExpression']}\")\n    \n    # Expert review and justification\n    print(f\"Justification: {annotation['justification']}\")\n    print(f\"Review Comments: {annotation['reviewComments']}\")\n    print(f\"Feedback: {annotation['feedback']}\")\n    \n    # Similarity and literature support\n    print(f\"Similar Clusters: {annotation['similarity']}\")\n    print(f\"Corroborating Papers: {len(annotation['corroboratingPapers']['papers'])} papers\")\n    \n    # Model usage and performance metrics\n    print(f\"Models Used: {annotation['llmModels']}\")\n    print(f\"Total Processing Time: {annotation['usageInfo']['total_runtime_seconds']:.1f}s\")\n    print(f\"Total Tokens: {annotation['usageInfo']['total_tokens']}\")\n```\n\n#### Advanced Result Components\n\n**Expert Review System**: Each annotation undergoes multi-stage review with detailed feedback:\n- **Review Comments**: Expert-level biological interpretation and mechanistic insights\n- **Confidence Assessment**: Moderate/High confidence based on marker evidence\n- **Feedback Loop**: Iterative refinement based on biological plausibility\n- **Mechanistic Analysis**: Discussion of signaling pathways, developmental biology, and disease pathogenesis\n\n**Literature Integration**: Automatic literature search provides supporting evidence:\n- **Corroborating Papers**: Relevant publications with PMIDs and summaries\n- **Biological Context**: Integration of current research to validate annotations\n\nExample corroborating papers:\n```python\npapers = annotation['corroboratingPapers']['papers']\nfor paper in papers:\n    print(f\"Title: {paper['title']}\")\n    print(f\"PMID: {paper['pmid']}\")\n    print(f\"Journal: {paper['journal']} ({paper['year']})\")\n    print(f\"Summary: {paper['summary']}\")\n```\n\nSample output:\n```\nTitle: YAP regulates alveolar epithelial cell differentiation and AGER via NFIB/KLF5/NKX2-1\nPMID: 34466790\nJournal: iScience (2021)\nSummary: Documents atypical HOPX+AGER+SFTPC+ 'dual-positive' alveolar cells that \npersist in mature lungs, directly validating the mixed AT1/AT2 phenotype observed \nin malignant clusters.\n```\n\n**Marker Analysis**: Comprehensive evaluation of gene expression patterns:\n- **Supporting Markers**: Genes that strongly support the annotation\n- **Conflicting Markers**: Genes that challenge the annotation with explanations\n- **Missing/Unexpected Expression**: Detailed analysis of expression anomalies with biological explanations\n\nExample unexpected expression analysis:\n```\n\"Expression of AT2 markers (SFTPC, SFTPB) and club cell marker (SCGB1A1) \nin a cluster with strong AT1 markers\" \n\u2192 Explained by: \"dedifferentiation process in cancer where transformed \nepithelial cells exhibit aberrant co-expression of markers from multiple \nlineages due to pathological plasticity\"\n```\n\n**Performance Metrics**: Detailed usage statistics for transparency:\n- **Model Information**: Which LLM models were used for each analysis step\n- **Runtime Statistics**: Processing time and token usage per cluster\n- **Annotation Attempts**: Number of refinement iterations\n\n#### Example Annotations\n\nCyteType provides sophisticated, multi-layered annotations:\n\n**Basic Cell Type**: `\"B-cell\"`, `\"Lung Adenocarcinoma Cell\"`\n\n**Granular Annotations**: Detailed phenotypic descriptions:\n- `\"AGER-positive, HOPX-positive, KRT19-positive lung adenocarcinoma cell with mixed AT1/AT2 phenotype\"`\n- `\"EMT-transitioned, pleural metastasis-competent adenocarcinoma cell with platinum-induced stress phenotype\"`\n- `\"CD74-high activated tumor-infiltrating B-cell in lung adenocarcinoma microenvironment\"`\n\n**Cell States**: Functional and pathological states:\n- `\"Transformed\"`, `\"Malignant\"`, `\"Activated\"`, `\"EMT-transitioned and stressed\"`\n\n**Expert Review Comments**: Detailed mechanistic insights:\n```\n\"The mixed AT1/AT2 phenotype observed in this malignant cluster exemplifies \nthe pathological dedifferentiation characteristic of lung adenocarcinoma, \nbut the degree of lineage promiscuity suggests exceptional cellular plasticity \nbeyond typical adenocarcinoma patterns. This may indicate activation of \nprimitive developmental pathways like Wnt/\u03b2-catenin signaling...\"\n```\n\n## Development\n\n### Setup\n\n```bash\ngit clone https://github.com/NygenAnalytics/cytetype.git\ncd cytetype\nuv sync --all-extras\nuv run pip install -e .\n```\n\n### Exception Handling\n\nThe package defines several custom exceptions for different error scenarios:\n\n- **`CyteTypeError`**: Base exception class for all CyteType-related errors\n- **`CyteTypeAPIError`**: Raised for errors during API communication (network issues, invalid responses)\n- **`CyteTypeTimeoutError`**: Raised when API requests timeout\n- **`CyteTypeJobError`**: Raised when the API reports an error for a specific job\n\n### Testing\n\n```bash\nuv run pytest              # Run tests\nuv run ruff check .        # Linting\nuv run ruff format .       # Formatting\nuv run mypy .              # Type checking\n```\n\n## License\n\nLicensed under CC BY-NC-SA 4.0 - see [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": "CC BY-NC-SA 4.0",
    "summary": "Python client for characterization of clusters from single-cell RNA-seq data.",
    "version": "0.8.3",
    "project_urls": {
        "Homepage": "https://github.com/NygenAnalytics/cytetype",
        "Issues": "https://github.com/NygenAnalytics/cytetype/issues",
        "Repository": "https://github.com/NygenAnalytics/cytetype"
    },
    "split_keywords": [
        "bioinformatics",
        " single-cell",
        " rna-seq",
        " annotation",
        " cell types"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd36fa1f139072265b9ce38576276883dd6efabf1a625865bd560a866885492f",
                "md5": "0d1cb3730184ffe34d3d0e84012d95b4",
                "sha256": "6a4cd46774e70492d092c581fdc0cd6022e77283e85095790b78025694ef929f"
            },
            "downloads": -1,
            "filename": "cytetype-0.8.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d1cb3730184ffe34d3d0e84012d95b4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 45043,
            "upload_time": "2025-07-30T14:24:38",
            "upload_time_iso_8601": "2025-07-30T14:24:38.007268Z",
            "url": "https://files.pythonhosted.org/packages/bd/36/fa1f139072265b9ce38576276883dd6efabf1a625865bd560a866885492f/cytetype-0.8.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "98fdf3e8b1b71d38b0eb50f38ee01854275035c1ab1690abc577dbb91fdcfffe",
                "md5": "1d4273ded76cc41fd6de5a443d38986e",
                "sha256": "e05acde70ef1a5b351f663ded43985e278cda6ec96929bdf94a9f61353d95ada"
            },
            "downloads": -1,
            "filename": "cytetype-0.8.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1d4273ded76cc41fd6de5a443d38986e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 44782,
            "upload_time": "2025-07-30T14:24:39",
            "upload_time_iso_8601": "2025-07-30T14:24:39.296551Z",
            "url": "https://files.pythonhosted.org/packages/98/fd/f3e8b1b71d38b0eb50f38ee01854275035c1ab1690abc577dbb91fdcfffe/cytetype-0.8.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-30 14:24:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NygenAnalytics",
    "github_project": "cytetype",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cytetype"
}
        
Elapsed time: 1.26106s